Project

General

Profile

Actions

Feature #17753

open

Add Module#namespace

Added by tenderlovemaking (Aaron Patterson) over 3 years ago. Updated over 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:103044]

Description

Given code like this:

module A
  module B
    class C; end
    class D; end
  end
end

We can get from C to B like C.outer_scope, or to A like
C.outer_scope.outer_scope.

I want to use this in cases where I don't know the outer scope, but I
want to find constants that are "siblings" of a constant. For example,
I can do A::B::C.outer_scope.constants to find the list of "sibling"
constants to C. I want to use this feature when walking objects and
introspecting. For example:

ObjectSpace.each_object(Class) do |k|
  p siblings: k.outer_scope.constants
end

I've attached a patch that implements this feature, and there is a pull request on GitHub here.


Files

0001-Add-Module-outer_scope.patch (5.93 KB) 0001-Add-Module-outer_scope.patch tenderlovemaking (Aaron Patterson), 03/26/2021 07:19 PM
0001-Add-Module-namespace.patch (5.89 KB) 0001-Add-Module-namespace.patch tenderlovemaking (Aaron Patterson), 03/27/2021 09:51 PM

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

What would you expect if a module has multiple names?

module E; end
E::F = A::B::C

Should A::B::C.outer_scope return A::B or E?

Updated by Eregon (Benoit Daloze) over 3 years ago

@sawa (Tsuyoshi Sawada) I'd say first assignment to a named constant wins, just like for Module#name.

I agree with the feature.
I'd suggest Module#namespace for the name though.
For example, I'd say the namespace of Process::Status is Process.

scope feels too general to me, and there are many other scopes, so I think namespace is a more precise term for it.

namespace is also the term used in https://github.com/ruby/ruby/blob/master/doc/syntax/modules_and_classes.rdoc#label-Modules

Updated by tenderlovemaking (Aaron Patterson) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-2:

@sawa (Tsuyoshi Sawada) I'd say first assignment to a named constant wins, just like for Module#name.

Yes, this is what I would expect too (and implemented). 😄

I agree with the feature.
I'd suggest Module#namespace for the name though.
For example, I'd say the namespace of Process::Status is Process.

Yes, this is a much better name. I've updated the patch to use "namespace".

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

This feature is reminiscent of Module.nesting. The difference is that the former has dynamic scope and the latter lexical scope. Besides that, I do not see any reason to make them different in any way. What about returning an array of the nested modules (perhaps including self) rather than just the direct parent?

module A; module B; class C; Module.nesting end end end # => [A::B::C, A::B, A]

A::B::C.outer_scope # => [A::B::C, A::B, A]

Updated by byroot (Jean Boussier) over 3 years ago

Besides that, I do not see any reason to make them different in any way

Well, if Module.nesting because of its scope semantic can't be chained. Module.nesting.nesting would be problematic.

The proposed feature is very easily chainable:

A::B::C.namespace # => A::B
A::B::C.namespace.namespace # => A

So returning an array doesn't give anything that's not already achievable, and cause an array allocation that some users would rather avoid in some situations.

Actions #6

Updated by tenderlovemaking (Aaron Patterson) over 3 years ago

  • Subject changed from Add Module#outer_scope to Add Module#namespace

Updated by fxn (Xavier Noria) over 3 years ago

I like the direction this is going towards, however, let me record some remarks for the archives.

Java has namespaces. Ruby does NOT have namespaces. That first sentence in the module docs, also present in books, is a super naive entry point to modules. But it is an abuse of language that later on you should correct.

Ruby does not have syntax for types either.

Ruby has storage (variables, constants, etc.), and objects. That is all, variables, constants, and module objects are totally decoupled except for the fact that you get a name in the first constant assignment. A name that does not reflect the nesting, that is not guaranteed to be unique, that does not mean the object is reachable via that constant path, and that some classes change by overriding the name method. It is just a string.

A library like Zeitwerk or Active Support can take some licenses "you know what I mean" because they are libraries and they work on the assumption of projects structured in a certain way. But a programming language has to be consistent with itself. Module#constants is consistent, in my view Module#namespace is not (with the current model).

So, if Ruby core wants to go in this direction and contribute to normalize a bit the mental model, I am onboard. But we have to be conscious that this is introducing something that is going to leak some way or another.

Updated by fxn (Xavier Noria) over 3 years ago

Let me add some edge cases that are possible, also for the archives:

module M
  module N
  end
end

M::N.namespace # => A::B::C, constant M stores the same object as A::B::C
M.namespace # => M, module is namespace of itself
M::N.namespace # => M
M.namespace    # => M::N, cycles of arbitrary depth
X = M::N
# ...
X.namespace # => The module that was once in M has been garbage collected (assuming a weak ref for backwards compat)

I am sure I can come with more if I think more about it.

The Ruby model of this is extremely flexible and decoupled, and that is the public interface. Constant assignment, constants API, instantiation of anonymous modules, etc.

Updated by fxn (Xavier Noria) over 3 years ago

Also, in case my comments above are too generic, let's take the use case in the description of the ticket:

I can do A::B::C.outer_scope.constants to find the list of "sibling" constants to C.

Let's consider

module A
  module B
    class C; end
    class D; end
  end 
end

module X
  Y = A::B::C
  Z = 1
end

In what sense is A::B::D a sibling of the class object stored in A::B::C and X::Z is not?

Take now

module X; end
module Y; end
module Z; end

c = Class.new
X::C = c
Y::C = c
Z::C = c

For Ruby, that's all objects and storage, where's c stored has no relevance. It is not different than

module X; end
module Y; end
module Z; end

X::C = 1
Y::C = 1
Z::C = 1

Yes, c.name is "X::C", but as I said above, that is just a string.

If our input is a class object, as in the ObjectSpace example, you have no information that allows you to jump from it to its possibly multiple places in which the object may be stored. And the original constant may be gone, those places can be elsewhere (as it happens with stale class objects cached during Rails initialization after a reload).

On the other hand, if you are in a very specific situation where you can assume that loop makes sense for all k, you can always name.sub(/::\w+$/, '') and const_get, modulus details. Or you can ObjectSpace.each_object(Module) and inspect constants.

In a project, in a library, you may have constraints in place that you can exploit. In Ruby, the language, you don't.

Updated by tenderlovemaking (Aaron Patterson) over 3 years ago

Yes, c.name is "X::C", but as I said above, that is just a string.

It's also a way to inform the user where that constant lives. The contents of the string have meaning.

On the other hand, if you are in a very specific situation where you can assume that loop makes sense for all k, you can always name.sub(/::\w+$/, '') and const_get, modulus details.

This would work if I could trust the name method on a class (I can't, especially in a Rails project).

Of course there are some edge cases with redefinition, but since the "namespace" method would line up with what the "name" method is supposed to return, I think it would be easy to understand the behavior.

Updated by Eregon (Benoit Daloze) over 3 years ago

I think those edge cases are pretty rare.
Module#namespace would refer to the lexical parent when the module is created (with module Name) or when first assigned to a constant (Name = Module.new).

The first example of https://bugs.ruby-lang.org/issues/17753#note-8 would already need extremely contrived code like:

module A
  module B
    module C
    end
  end
end

module M
  N = A::B::C
  module N
  end
  p N.namespace
end

and even then the value could still be useful.

In the end, the exact same caveats exist for Module#name and yet it's fine in practice.

A module is a namespace of constants.

Updated by fxn (Xavier Noria) over 3 years ago

It's also a way to inform the user where that constant lives. The contents of the string have meaning.

The numerous people that have had to deal with stale objects in Rails for years know that is not entirely true. The class objects have a name, but that constant path no longer gives you the object at hand, but some another object that happens to have the same name.

Benoit, but a programming language is a formal system. It has to define things that are consistent with its model! It does not matter in my view if the examples are statistically rare. They are only ways to demonstrate the definition does not match the way Ruby works.

A module is a namespace of constants.

Yes, but it is dynamic because of const_set and remove_const, and your APIs and concepts need to reflect that.

If you wanted namespaces, you'd have a different programming language where everything is consistent with that concept. But Ruby is not that way.

Same way Ruby does not have types. Admin::User is not a type (we all in this thread know that), it is a constant path. That is all you got, constants and objects, and constants API.

Updated by fxn (Xavier Noria) over 3 years ago

BTW, we were discussing yesterday with Aaron that the flag I am raising is about the name namespace. What we are defining given a class object is:

  1. If the class object is anonymous, return nil.
  2. Otherwise, it was assigned to a constant at least once. Let namespace be the module object that stored that constant at assignment time if it is an alive object, nil if the original object is gone (possible depending on whether the reference is weak or not).

We do not have a good name for that.

Another thing Aaron is exploring is to define Module#namespaces, which would return all modules that store the class object in one of their constants. That is a bit closer to the Ruby model, I believe.

Updated by fxn (Xavier Noria) over 3 years ago

To me, the ability of a namespace being namespace of itself

m = Module.new
m::M = m

is one clear indicator that the name is not quite right. That is not the kind of property you expect a namespace to have. And it is not quite right because we are dealing with storage and objects. In the world of storage and objects that example squares perfectly, there is no surprise.

Updated by fxn (Xavier Noria) over 3 years ago

Oh, let me say something explicitly: You guys are Ruby committers, you are the ones that have the vision for what makes sense in the language.

I am raising a flag because this does not square to me, and makes me feel it is introducing a leaking abstraction not backed by the language model. It is an abstraction you could tolerate in Active Support with documented caveats, but not one that I personally see in Ruby itself.

However, if once the feedback has been listened to you believe this API squares with your vision of Ruby, by all means go ahead :).

Updated by Eregon (Benoit Daloze) over 3 years ago

I see, the name namespace is what we're disagreeing on.
Maybe you have another suggestion for the name of that method?

Maybe outer_module/outer_scope would work too, but I feel namespace is nicer.
All these 3 names imply some kind of lexical relationship. And even though that can be broken, in most cases it is respected and the intent of the user using this new method is clearly to go one step up in the lexical relationship.
So we should mention in the docs this might return unusual results for e.g. anonymous modules that are later assigned to a constant.

FWIW internally that's named "lexical parent module" in TruffleRuby, but that doesn't make a nice method name.
Indeed, it's not always "lexical" but in the vast majority of cases the intent is that and we would say A::B is namespaced or defined under (rb_define_module_under) module/class A.

The way I see it is modules are the closest thing to a namespace that Ruby has. And therefore Module#namespace feels natural to me.
From the other direction, I agree namespaces often have different/stricter semantics than Ruby modules in other languages.
Yet I think it's OK to have a slightly different meaning for namespace in Ruby, and that seems already established in docs.

Updated by fxn (Xavier Noria) over 3 years ago

The lexical parent module happens to be just the object from which you set the name, which does not even reflect the scope/nesting at assignment time (as you know):

module A::B
  module X::Y
    class C
      name # => "X::Y::C"
    end
  end
end

If modules are namespaces, why isn't A::B there?

Yeah, we see it differently. There, I only see a constant assignment. You see it like "most of the time, you can think of it that way because that's how most of Ruby looks like". That difference in points of view is fine :).

Updated by fxn (Xavier Noria) over 3 years ago

BTW, you all know AS has this concept right? https://github.com/rails/rails/blob/f1e00f00295eb51a64a3008c7b1f4c4f46e902e3/activesupport/lib/active_support/core_ext/module/introspection.rb#L20-L37

We say "according to its name", have the X example to clearly see the assumptions, and case closed.

As I said before, AS can take licenses, it is not Ruby itself. And in the context of Rails (the most common case for AS), you can assume some structure.

Updated by mame (Yusuke Endoh) over 3 years ago

This ticket was discussed on the dev meeting. @matz (Yukihiro Matsumoto) said that (1) the use case is not clear to him, and that (2) he wants to keep the keyword namespace for another feature in future. outer_scope is also weird because the return value is not a "scope".

Updated by fxn (Xavier Noria) over 3 years ago

In my view, the way to implement the use case that matches Ruby is to go downwards.

Module has many constants, that is the Ruby model, so instead of

ObjectSpace.each_object(Class) do |k|
  k.outer_scope.constants
end

you'd write

ObjectSpace.each_object(Module) do |mod|
  mod.constants.each do |constant|
    # Do something with constant.
  end
end

Alternatively, recurse starting at Object (would miss anonymous modules with constants).

Updated by ioquatix (Samuel Williams) over 2 years ago

@tenderlovemaking (Aaron Patterson) what about some kind of "uplevel" concept for name:

class A::B::C::MyClass; end

A::B::C::MyClass.name(0) # -> "MyClass"
A::B::C::MyClass.name(1) # -> "C::MyClass"
A::B::C::MyClass.name(-1) # -> "A::B::C"
A::B::C::MyClass.name(-2) # -> "A::B"

etc

Updated by sawa (Tsuyoshi Sawada) over 2 years ago

ioquatix (Samuel Williams) wrote in #note-21:

class A::B::C::MyClass; end

A::B::C::MyClass.name(0) # -> "MyClass"
A::B::C::MyClass.name(1) # -> "C::MyClass"
A::B::C::MyClass.name(-1) # -> "A::B::C"
A::B::C::MyClass.name(-2) # -> "A::B"

What is the rule behind what the argument represents? To me, your four examples except for the first one seem to suggest:

  1. The nesting levels (achieved by separating the full name by ::) can be referred to by an index as if they were placed in an array.
  2. a. If the argument is negative, then remove the nesting levels from the one indexed by the argument up to the last one.
    b. If the argument is non-negative, then remove the nesting levels from the first one up to the one indexed by the argument.
  3. Join the remaining nesting levels with ::.

But, then I would expect:

A::B::C::MyClass.name(0) # -> "B::C::MyClass"

contrary to what you wrote.

What is your intended logic? Is it coherent?

Updated by ioquatix (Samuel Williams) over 2 years ago

class Class
  def name(offset = nil)
    return super() unless offset

    parts = super().split('::')

    if offset >= 0
      parts = parts[(parts.size - 1 - offset)..-1]
    else
      parts = parts[0...(parts.size + offset)]
    end

    return parts.join('::')
  end
end

module A
  module B
    module C
      class MyClass
      end
    end
  end
end

pp A::B::C::MyClass.name(0) # -> "MyClass"
pp A::B::C::MyClass.name(1) # -> "C::MyClass"
pp A::B::C::MyClass.name(-1) # -> "A::B::C"
pp A::B::C::MyClass.name(-2) # -> "A::B"

Something like this.

Updated by sawa (Tsuyoshi Sawada) over 2 years ago

@ioquatix (Samuel Williams) (Samuel Williams)

The non-negative part of your code looks pretty much convoluted. To simplify your code (and define it on Module rather than on Class), it would be essentially this:

Module.prepend(Module.new do
  def name(offset = nil)
    return super() unless offset

    super().split('::').then do
      if offset >= 0
        _1.last(offset + 1)
      else
        _1[...offset]
      end
    end.join('::')
  end
end)

This indicates that you are essentially using the argument offset:

  • to specify the number of elements when offset is non-negative, and
  • to specify the ending position (index) of the elements otherwise

which is incoherent. At least to me, your proposal is in fact difficult to understand because of this. I think it should be unified so that either offset expresses the number all the way down, or it does the position all the way down. Or, perhaps you can limit offset to non-negative.

Updated by ioquatix (Samuel Williams) over 2 years ago

@sawa (Tsuyoshi Sawada) Thanks for your feedback and the improved code.

Based on my own needs and other code (see https://apidock.com/rails/ActiveSupport/Inflector/demodulize and https://apidock.com/rails/ActiveSupport/Inflector/deconstantize for example) I see two main use cases:

(1) Get some part of the namespace starting from the left. The most common use case is "The entire module namespace without the class name" but it will also be convenient to cut off more than just the class name in some cases. Since how deeply nested we are is usually not known, cutting from the right hand side makes sense.
(2) Get some part of the class name starting from the right. The most common case is "Just the class name without any module namespace" but it will also be convenient to include some of the nested modules expanding towards the right in some cases.

To me it's consistent within the requirements of solving those two problems and maps nicely to negative and non-negative integers respectively. I realise that between the negative and non-negative offset, there is no continuity but this is by design to satisfy user needs rather than theoretical purity. If you have a better idea, please share it!

In more detail, I don't think this offset should be impacted by changes to module nesting, i.e.

# This should be the same:
A::B::C::MyClass.name(1) # C::MyClass
Z::A::B::C::MyClass.name(1) # C::MyClass

# This should always be full module namespace:
A::B::C::MyClass.name(-1) # A::B::C
Z::A::B::C::MyClass.name(-1) # Z::A::B::C

In addition, user won't know ahead of time the nesting level of the class, and so this proposed interface needs to satisfy the most common use cases without any extra computation, otherwise, user is forced to do string manipulation again. In theory, this proposed interface should also be efficient to implement.

Updated by austin (Austin Ziegler) over 2 years ago

ioquatix (Samuel Williams) wrote in #note-25:

To me it's consistent within the requirements of solving those two problems and maps nicely to negative and non-negative integers respectively. I realise that between the negative and non-negative offset, there is no continuity but this is by design to satisfy user needs rather than theoretical purity. If you have a better idea, please share it!

Since name doesn’t currently accept any arguments, why not make it a keyword instead of a simple integer?

A::B::C::MyClass.name(tail: 1) # C::MyClass
A::B::C::MyClass.name(head: 1) # A::B::C

I don’t know what the name of the keywords should be.

Updated by shioyama (Chris Salzberg) over 1 year ago

This has been quiet for a while, but despite the reservations expressed I'd really like to see it implemented.

I don't personally really like namespace as a name either, because of its usage in other places. It's been mentioned, but what exactly is wrong with Module#module_parent, the same method ActiveSupport uses?

@fxn (Xavier Noria)

Yes, c.name is "X::C", but as I said above, that is just a string.

It's a string, yes, but it also includes some rules about what to return when that I think are relevant to your concerns about edge cases.

I think any implementation of this method should be fully consistent with Module#name. That means also encompassing names that are temporary (where the root is anonymous).

Just so we're all on the same page:

mod = Module.new
mod.name
#=> nil

mod::Foo = Module.new
mod::Foo.name
#=> "#<Module:0x0000000106471d80>::Foo"

"#<Module:0x0000000106471d80>::Foo" is mod::Foo's "temporary name". We can assign another module under an anonymous root to it and it will not change:

other_mod = Module.new
other_mod::Bar = mod::Foo

other_mod::Bar.name
#=> "#<Module:0x0000000106471d80>::Foo"

So temporary names are "sticky" as long as the module doesn't have a permanent name. Once it has a permanent name, that name does not change regardless of assignment to other toplevel-rooted constants.

So we have some rules for how Module#name can and cannot change:

  1. A module's permanent name, once assigned, cannot be re-assigned. Although you can nest the same constant in many places in many ways, the name, once the constant has been attached to a permanent root, will not change.
  2. A module's temporary name, once assigned, cannot be re-assigned except to a permanent name. You can assign another constant from an anonymous-rooted namespace, but the module's original temporary name sticks and only changes when/if it gets a permanent name.

I think these rules give us everything we need to define a method that returns the immediate parent of a module according to what the name specifies, and I think this would be a very useful method to have.

Extended to anonymous roots, I would expect this (I'm using module_parent here, but replace with whatever name is agreed is best):

mod = Module.new
mod::A = Module.new
mod::A::B = Module.new

mod::A::B.name
#=> "#<Module:0x0000000109a98fd0>::A::B"

mod::A::B.module_parent
#=> #<Module:0x0000000109a98fd0>::A
mod::A::B.module_parent.module_parent
#=> #<Module:0x0000000109a98fd0>

# Temporary name has been assigned so assigning to
# another constant rooted in an anonymous module has no impact.
other_mod = Module.new
other_mod::C = mod::A::B

other_mod::C.name
#=> "#<Module:0x0000000109a98fd0>::A::B"
other_mod::C.module_parent
#=> #<Module:0x0000000109a98fd0>::A

# Permanent name has been assigned to the root,
# so both `name` and `module_name` change accordingly
D = mod

mod::A::B.name
#=> "D::A::B"
mod::A::B.module_parent
#=> D::A
mod::A::B.module_parent.module_parent
#=> D

# Giving another permanent name has no impact.
E = mod
E::A::B.name
#=> "D::A::B"
E::A::B.module_parent
#=> D::A

This is entirely consistent with how name works, and I think is in fact a very natural complement to it. Since conventions are exactly the same, there is no need for any additional "rules" to cover the edge cases mentioned earlier.

As an implementation, this is fully determined, consistent with an existing pattern (Module#name) and works as expected for most common use cases.

@mame (Yusuke Endoh)

the use case is not clear to him

It's been mentioned above, but ActiveSupport and other libraries use mod.name.split("::") all over the place to go from something they can see (Module#name) to something they can use (actual module objects). This has always seemed extremely clumsy to me; Ruby generated the name from the module objects, but it will only give you the "trace" of them, not the actual living things.

Personally, I've been recently working with anonymous-rooted namespaces (like mod and mod::Foo above) and the inability to get the root of a module is yet more problematic, because name.split("::") and constantize don't work in that context. I'd love to see this happen, under any name that seems appropriate.

Updated by fxn (Xavier Noria) over 1 year ago

Yeah, I believe the feature makes sense and can be useful, and the proposed implementation that keeps a pointer is well-defined (vs what AS does). Also consistent with Module#name, as you said.

My observations above were more related to the name namespace I think, because we are defining "the class or module object that holded the constant to which I was initially assigned, if any". That is weaker.

Regarding your proposal, in English would be parent_module right?

Let me add something, this is practical, but the Ruby model suffers just a little bit in my view. Let me explain.

To me:

module M
  X = String.new
end

module N
  X = Module.new
end

are essentially the same, Ruby objects stored in places. As we all know, the Ruby model is quite unconstrained, which makes it beautiful, but also weaker in a logical sense.

So, there is a part of me that believes for consistency the string object should also respond to parent_module, which is a weird derivation. (I am playing reductio at absurdum here, no proposing that!).

However, maybe in this case practicality can win over a pure and spherical language model, I don't know :).

Updated by fxn (Xavier Noria) over 1 year ago

Let me reword that last remark about "the Ruby model suffers just a little bit".

Ruby objects and variables behave the same. But constants are not as orthogonal and generic:

  1. When you assign an integer to a constant, it's just storage. If you assign a class or module object with a name, it's just storage. However, assigning an anonymous class or module object changes the state, it has a side-effect that is only programmed for these objects.
  2. The class and module keywords perform constant assignments.

So, "suffer" is not an exact word for what I have in mind. I'd say this change would accentuate an already existing asimmetry, and in a way that is consitent with Module#name. This is not bad, it goes in a consistent direction.

Updated by shioyama (Chris Salzberg) over 1 year ago

When you assign an integer to a constant, it's just storage. If you assign a class or module object with a name, it's just storage. However, assigning an anonymous class or module object changes the state, it has a side-effect that is only programmed for these objects.

That's a very interesting way to put it, I hadn't thought of it that way. And indeed those side-effects scale with the number of other classes and modules rooted in the thing that was named.

Updated by shioyama (Chris Salzberg) over 1 year ago

Regarding your proposal, in English would be parent_module right?

My interpretation here, but to me "parent module" would signify the "parent thing that is a module" of the current thing (whatever that thing may be), whereas "module parent" would signify the "module's parent", implying the parent could be anything (but kind of implying it is also a module).

English-wise as a method name either is possible, they just have slightly different emphasis. I suggested module_parent because it exists and is being used for the same thing, so there's a precedent, which might make it easier to agree on.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0