Project

General

Profile

Actions

Feature #19024

closed

Proposal: Import Modules

Added by shioyama (Chris Salzberg) over 1 year ago. Updated about 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:110097]

Description

There is no general way in Ruby to load code outside of the globally-shared namespace. This makes it hard to isolate components of an application from each other and from the application itself, leading to complicated relationships that can become intractable as applications grow in size.

The growing popularity of a gem like Packwerk, which provides a new concept of "package" to enforce boundaries statically in CI, is evidence that this is a real problem. But introducing a new packaging concept and CI step is at best only a partial solution, with downsides: it adds complexity and cognitive overhead that wouldn't be necessary if Ruby provided better packaging itself (as Matz has suggested it should).

There is one limited way in Ruby currently to load code without polluting the global namespace: load with the wrap parameter, which as of https://bugs.ruby-lang.org/issues/6210 can now be a module. However, this option does not apply transitively to require calls within the loaded file, so its usefulness is limited.

My proposal here is to enable module imports by doing the following:

  1. apply the wrap module namespace transitively to requires inside the loaded code, including native extensions (or provide a new flag or method that would do this),
  2. make the wrap module the toplevel context for code loaded under it, so ::Foo resolves to <top_wrapper>::Foo in loaded code (or, again, provide a new flag or method that would do this). Also make this apply when code under the wrapper module is called outside of the load process (when top_wrapper is no longer set) — this may be quite hard to do.
  3. resolve name on anonymous modules under the wrapped module to their names without the top wrapper module, so <top_wrapper>::Foo.name evaluates to "Foo". There may be other ways to handle this problem, but a gem like Rails uses name to resolve filenames and fails when anonymous modules return something like #<Module: ...>::ActiveRecord instead of just ActiveRecord.

I have roughly implemented these three things in this patch. This implementation is incomplete (it does not cover the last highlighted part of 2) but provides enough of a basis to implement an import method, which I have done in a gem called Im.

Im provides an import method which can be used to import gem code under a namespace:

require "im"
extend Im

active_model = import "active_model"
#=> <#Im::Import root: active_model>

ActiveModel
#=> NameError

active_model::ActiveModel
#=> ActiveModel

active_record = import "active_record"
#=> <#Im::Import root: active_record>

# Constants defined in the same file under different imports point to the same objects
active_record::ActiveModel == active_model::ActiveModel
#=> true

With the constants all loaded under an anonymous namespace, any code importing the gem can name constants however it likes:

class Post < active_record::ActiveRecord::Base
end

AR = active_record::ActiveRecord

Post.superclass
#=> AR::Base

Note that this enables the importer to completely determine the naming for every constant it imports. So gems can opt to hide their dependencies by "anchoring" them inside their own namespace, like this:

# in lib/my_gem.rb
module MyGem
  dep = import "my_gem_dependency"

  # my_gem_dependency is "anchored" under the MyGem namespace, so not exposed to users
  # of the gem unless they also require it.
  MyGemDependency = dep

  #...
end

There are a couple important implementation decisions in the gem:

  1. Only load code once. When the same file is imported again (either directly or transitively), "copy" constants from previously imported namespace to the new namespace using a registry which maps which namespace (import) was used to load which file (as shown above with activerecord/activemodel). This is necessary to ensure that different imports can "see" shared files. A similar registry is used to track autoloads so that they work correctly when used from imported code.
  2. Toplevel core types (NilClass, TrueClass, FalseClass, String, etc) are "aliased" to constants under each import module to make them available. Thus there can be side-effects of importing code, but this allows a gem like Rails to monkeypatch core classes which it needs to do for it to work.
  3. Object.const_missing is patched to check the caller location and resolve to the constant defined under an import, if there is an import defined for that file.

To be clear: I think 1) should be implemented in Ruby, but not 2) and 3). The last one (Object.const_missing) is a hack to support the case where a toplevel constant is referenced from a method called in imported code (at which point the top_wrapper is not active.)

I know this is a big proposal, and there are strong opinions held. I would really appreciate constructive feedback on this general idea.

Notes from September's Developers Meeting: https://github.com/ruby/dev-meeting-log/blob/master/DevMeeting-2022-09-22.md#feature-10320-require-into-module-shioyama

See also similar discussion in: https://bugs.ruby-lang.org/issues/10320


Related issues 3 (3 open0 closed)

Related to Ruby master - Feature #10320: require into moduleOpenActions
Related to Ruby master - Feature #19277: Project-scoped refinementsOpenActions
Related to Ruby master - Feature #19744: Namespace on readOpenActions
Actions #1

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

Updated by fxn (Xavier Noria) over 1 year ago

Intuitively, this proposal changes the way Ruby works in a way that I believe has too many ramifications and edge cases. Also, it delegates control to the caller, rather than to the subject (the code being loaded).

Since the problem the proposal wants to address is access of constants cross-packages, I wonder if an approach that would be more aligned with Ruby would be to introduce a "package" visibility keyword.

If P::C is declared to be visible only for package P, then only code within P is able to refer to P::C. This could be encoded in the constant resolution algorithms (ignoring the existence of P::C according to the rule), and would flow quite well with current Ruby semantics for everything else.

Updated by byroot (Jean Boussier) over 1 year ago

I'm also very much in favor of introducing a first class "package" construct, since there are way too many edge cases for loading existing code in an isolated way like this. I might work for some, maybe even most packages, but ultimately it will certainly require cooperation from the packages, so might as well offer a proper construct for it.

Updated by shioyama (Chris Salzberg) over 1 year ago

Intuitively, this proposal changes the way Ruby works in a way that I believe has too many ramifications and edge cases. Also, it delegates control to the caller, rather than to the subject (the code being loaded).

I'm actualy trying to change Ruby as little as possible to make this happen. load with the wrap option already does some of this, and delegates control to the caller, so this is not exactly new.

And to me, delegating control to the caller is natural. The problem with require is exactly that the caller loses control, resulting in called code being able to "park" itself wherever it likes. This is why every gem has to be a good citizen and keep its code in one namespace.

Since the problem the proposal wants to address is access of constants cross-packages, I wonder if an approach that would be more aligned with Ruby would be to introduce a "package" visibility keyword.

To be clear, controlling cross-package access is a positive outcome of wrapping loaded code in a namespace, not the (only) goal of this proposal in and of itself. Namespacing has advantages of its own outside of boundary definition alone.

We have a large codebase made up of components. To effectively keep code separate, it's preferable if each component defines its own namespace (under a module), but in practice this means a lot of nesting classes and modules relative to the absolute top-level, and prefixing calls with ComponentA::Foo etc.

Whereas, if you have imports, each component of an application would be able to define itself at the toplevel of its own universe. Likewise, you could grant access to other components with constant references:

platform = import "components/platform"
shop_identity = import "components/shop_identity"

platform::Shop = shop_identity::Shop

there are way too many edge cases for loading existing code in an isolated way like this. I might work for some, maybe even most packages, but ultimately it will certainly require cooperation from the packages, so might as well offer a proper construct for it.

Wouldn't it make sense to first determine what those edge cases are? I feel like we're immediately jumping to the conclusion that "this is hard, so let's do this other thing instead" before we have determined if it really is that hard to do.

Updated by fxn (Xavier Noria) over 1 year ago

Fair, I see ramifications in transitivity for code unrelated to the packages (3rd party gems), constant resolution gimmicks, and a few other things. I only have spare time for this, but will try to summarize some of those concerns later if the conversation does not address them or doesn't lean into packages.

Updated by shioyama (Chris Salzberg) over 1 year ago

I only have spare time for this, but will try to summarize some of those concerns later if the conversation does not address them or doesn't lean into packages.

Thank you, that would help a lot!

I guess my resistance to packages as a new concept is that it feels to me like (with a bit of tweaking) existing concepts in Ruby might be enough to do the same thing. e.g. a namespaced that is not anchored at toplevel cannot see other toplevel constants, effectively creating a boundary without requiring any new mechanism to enforce it. (Actually, not quite true since you need a new mechanism to isolate the new toplevel...)

Anyway I'm sure imports as presented have gotchas (and I see a couple of them already). My preference is just to focus on those first (and validate them) so we're clear what we're talking about when we say that things are feasible or not.

Updated by austin (Austin Ziegler) over 1 year ago

shioyama (Chris Salzberg) wrote in #note-4:

Intuitively, this proposal changes the way Ruby works in a way that I believe has too many ramifications and edge cases. Also, it delegates control to the caller, rather than to the subject (the code being loaded).

I'm actually trying to change Ruby as little as possible to make this happen. load with the wrap option already does some of this, and delegates control to the caller, so this is not exactly new.

Speaking as a gem maintainer, I don’t see this as a minor change, and I think that it has far more negatives than an explicit new construct (packages, boundaries, whatever) would.

And to me, delegating control to the caller is natural. The problem with require is exactly that the caller loses control, resulting in called code being able to "park" itself wherever it likes. This is why every gem has to be a good citizen and keep its code in one namespace.

Delegating control to the caller is going to be a source of lots of #WONTFIX bugs on a lot of gems if they can be imported the way that you’re talking about. The gems that I have written are (mostly) self-contained, but some do include options to monkey patch core classes. Given that the operating environment will be different under an import as you’ve described it, I cannot possibly support those uses and would close such bugs as #WONTFIX.

On the other hand, I could absolutely see building gems that opt into a package system to provide optional boundaries, and building things that way would allow me to opt into the more complex support requirements that would entail.

Since the problem the proposal wants to address is access of constants cross-packages, I wonder if an approach that would be more aligned with Ruby would be to introduce a "package" visibility keyword.

To be clear, controlling cross-package access is a positive outcome of wrapping loaded code in a namespace, not the (only) goal of this proposal in and of itself. Namespacing has advantages of its own outside of boundary definition alone.

I don’t see any value in namespacing beyond what Ruby has through modules and classes. I certainly don’t see any value in the ability to load more than one version of a piece of code at a time under a different namespace (this is, IMO, one of the worst parts of JavaScript).

there are way too many edge cases for loading existing code in an isolated way like this. I might work for some, maybe even most packages, but ultimately it will certainly require cooperation from the packages, so might as well offer a proper construct for it.

Wouldn't it make sense to first determine what those edge cases are? I feel like we're immediately jumping to the conclusion that "this is hard, so let's do this other thing instead" before we have determined if it really is that hard to do.

I have to ultimately agree with Jeremy Evans that I think that making the wrapping transitive is the wrong thing. You’re putting maintenance burdens on countless third-party developers via their gems on something that only helps a small subset of Ruby developers with some of their large applications.

From a library author perspective, I would support the concept of a package_constant (similar to private_constant except that it looks at the caller’s origin tree) and package visibility for methods, because I could then opt into those. With import crossing gem boundaries, the library author gets no option to opt out (unless you allow something like a non_importable declaration).

Updated by shioyama (Chris Salzberg) over 1 year ago

Also make this apply when code under the wrapper module is called outside of the load process (when top_wrapper is no longer set) — this may be quite hard to do.

Turns out I was wrong about this, it's actually quite easy to track the top wrapper outside of the load process using cref flags. This also solves the problem of toplevel constant name conflicts when referenced outside of the load process. I was able to remove the Object.const_missing hack in the gem using this approach.

It should be possible to leverage the same trick to avoid some of the nastier code in the gem's require patch as well by setting the top_wrapper in require_internal to the wrapper module inferred from cref flags.

Updated by fxn (Xavier Noria) over 1 year ago

Some questions/remarks:

  • Gem a defines A, and gem b wants to reopen A. How would b access A in a way that does not depend on how was A loaded?
  • Let's imagine you are developing Nokogiri, and of course you can use Ruby as it is, with all its flexibility and in all its genericity. Of course, Nokogiri does not know if it is going to be transitively imported or not. Would there be things that would work in one mode and not in the other within Nokogiri?
  • In Ruby, String is not special: It is a constant in Object storing a class object, no different than User. This proposal treats certain constants as distinguished, and I believe this is not aligned with the Ruby model in which constant name resolution does not depend on the value stored by the constant.
  • In order to understand the goal description "isolate components", it would help me if you could describe one concrete way to use this idea in an application, and that description should cover the implications for unrelated 3rd-party gems.

This is all very misaligned with Ruby, in my opinion. Indeed, for me, load file, mod is a very edge interface that in my opinion should not be pursued further because it breaks a fundamental assumption in all existing code: That by looking at your source code, you know the nesting. I have always thought it has few use cases, and they involved cooperation caller/callee.

A design based on explicit cooperation feels more aligned to me. If your application is made of components, that is your contract: You have a concept of application, and you have a concept of "I am a component inside the application". In that sense, a notion of package where you opt-in, and by opting-in you know and control your runtime, sounds more sound to me.

Updated by fxn (Xavier Noria) over 1 year ago

In the first question, it is assumed that b wants A decoration to be in place for the entire process.

Updated by shioyama (Chris Salzberg) over 1 year ago

Thanks @fxn (Xavier Noria), let me start with (what I consider) the easier question first:

In Ruby, String is not special: It is a constant in Object storing a class object, no different than User. This proposal treats certain constants as distinguished, and I believe this is not aligned with the Ruby model in which constant name resolution does not depend on the value stored by the constant.

I agree. The proposal as far as Ruby goes does not treat any constants as distinguished. The gem code originally did have a list of constants, but no longer does (it instead looks at Object.constants at the time the gem is loaded).

What the Ruby patch does is setup an isolated space: it does not resolve any constant from under the wrapper namespace unless that constant has been put there. This is different from the current implementation of load with the wrapper of course, so that is one issue. But putting that aside, not resolving any constant does not distinguish anything, and I don't think this implementation is particularly misaligned with Ruby (maybe it would require a new option or flag, but that's just a question of interface).

The gem then takes advantage of this isolated namespace to "hoist" constants into the toplevel module so that they are accessible to the imported code. That is an implementation detail of the gem and not something I think Ruby should do itself.

Note that the gem "hoisting" things like Hash into every import namespace is what makes ActiveSupport core extensions possible. This is also what would result in problems if, for example, you were to load multiple versions of a gem that monkeypatched Hash or any other constant that was put there. But again, I don't think Ruby should do that, I only think (hope?) that Ruby can make this possible. Whether it happens or not would be fully in the hands of the code consumer.

Gem a defines A, and gem b wants to reopen A. How would b access A in a way that does not depend on how was A loaded?

Let's make this really concrete. With the current gem implementation, you can have this:

# foo.rb
if defined?(Bar)
  module Foo1
  end
else
  module Foo2
  end
end
# bar.rb
module Bar
end

require "foo"
# baz.rb
require "foo"

The result of loading code changes depending on the order these files are imported:

bar = import "bar"
baz = import "baz"
#=> bar::Foo1 and baz::Foo1 are defined
baz = import "baz"
bar = import "bar"
#=> bar::Foo2 and baz::Foo2 are defined

This is the biggest problem: initial conditions can impact how code is loading, resulting in different constants being defined for the same file. I frankly don't have a solution for this but I consider it the biggest obstacle to this idea, and it certainly could and would cause edge cases.

That said, to be clear, the Ruby patch does not actually hit this conflict, it's the gem that does. The patch only requires a file once, in whatever context it was required in. If you require it in a wrap context, that's where the code is required, period. If you try to require the same file again from toplevel, or from a different wrap context, you get false and nothing happens.

On its own, that's not very useful, so the gem patches require to track defined classes, requires etc. and make things work so multiple imports can share constants. But that's also where you end up with the issue of differences in how code can be loaded.

Would there be things that would work in one mode and not in the other within Nokogiri?

I'd need to look more closely at Nokogiri to answer that, so far I've been focusing on Rails. But I'd be glad to do that.

In order to understand the goal description "isolate components", it would help me if you could describe one concrete way to use this idea in an application, and that description should cover the implications for unrelated 3rd-party gems.

Sure, good question. Let me think about this one a bit first.

it breaks a fundamental assumption in all existing code: That by looking at your source code, you know the nesting.

This is an important point and one that I have thought about, and which of course is important to discuss. But I'd like to separate it from the question of whether the idea is even practically possible.

Putting my cards on the table, I personally have always found the assumption that all Ruby code is loaded from toplevel to be one of Ruby's biggest weaknesses. That's my view, and I'm happy to elaborate on it, but my focus right now will be objectively on whether this toplevel-centric design is inevitable or not.

Updated by fxn (Xavier Noria) over 1 year ago

Just a very quick followup:

But I'd like to separate it from the question of whether the idea is even practically possible.

Oh, if the purpose of this exploration is to see if this is technically doable then by all means go ahead :).

Putting my cards on the table, I personally have always found the assumption that all Ruby code is loaded from toplevel to be one of Ruby's biggest weaknesses. That's my view, and I'm happy to elaborate on it, but my focus right now will be objectively on whether this toplevel-centric design is inevitable or not.

I believe your pain point is that Ruby does not have formal namespaces, and we share it. The emulation via class/module objects and constants is a bit weak.

Please, note that other languages like Java always work in the global namespace (Java does not need imports like Python does). But the strict file structure and the convention of using domain names as package names resuts in separation in practice.

I'll ponder the rest of the reply later, also looking forward to that example! :)

Updated by byroot (Jean Boussier) over 1 year ago

I agree. The proposal as far as Ruby goes does not treat any constants as distinguished. The gem code originally did have a list of constants, but no longer does (it instead looks at Object.constants at the time the gem is loaded).

Note that namespaces can go two ways.

They avoid exposing their own constants to the outside.

But they also can be used to avoid using constants that haven't been explicitly declared (which make static analysis of dependencies easier).

As such if you look at how Python (for instance) does it, they have a list of constants considered "core" that are available from any namespace without having to import them under the implicit __builtins__ namespace.

>>> dir(__builtins__)                                                                            
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EncodingWarning', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

Updated by fxn (Xavier Noria) over 1 year ago

Well, I believe being in the global namespace is the rule in most programming languages. Lexical rules or aliasing is about local name visibility, but package names start at some root.

Updated by austin (Austin Ziegler) over 1 year ago

shioyama (Chris Salzberg) wrote in #note-11:

That said, to be clear, the Ruby patch does not actually hit this conflict, it's the gem that does. The patch only requires a file once, in whatever context it was required in. If you require it in a wrap context, that's where the code is required, period. If you try to require the same file again from toplevel, or from a different wrap context, you get false and nothing happens.

I’m still very much against this concept, because this rule will absolutely cause code to break. What you’re describing is something that only really has value for applications, but because you’re extending transitivity to require, you will end up hiding shared dependencies without the "benefit" of being able to load the same code (or multiple versions of the same code) more than once like in JavaScript. Worse, given the existence of autoload, tracking down these issues would itself be a bit of a heisenbug-hunt.

# a.rb
require 'faraday'

# b.rb
require 'faraday'

# app.rb
api1 = import "a"
api2 = import "b"

require 'faraday'

a::Faraday # => Faraday
b::Faraday # => NameError: uninitialized constant Faraday
Faraday # => NameError: uninitialized constant Faraday

Yes, the fix is easy: require 'faraday' before doing any `imports, but that breaks with autoload and without eager loading (not every Ruby application is using Rails with its use of Bundler eager loading).

The only ways that you can make any of this work with the reality of Ruby’s ecosystem are: (a) allow dependencies to opt out of being wrapped (which makes this misfeature less useful), (b) make it something that gems and app code can both opt into (e.g., something like a package_constant), or (c) make $LOADED_FEATURES unique per context (thereby allowing the same code to be loaded into memory more than once, which is one of JS’s biggest misfeatures).

Would there be things that would work in one mode and not in the other within Nokogiri?

I'd need to look more closely at Nokogiri to answer that, so far I've been focusing on Rails. But I'd be glad to do that.

The problem here isn’t so much Nokogiri on its own, but the fact that Nokogiri is a compiled extension. Any dependency that loads a compiled extension is going to have assumptions baked into the compiled code, and this would absolutely break those assumptions. And compiled extensions cannot typically be loaded more than once regardless of anything else.

Putting my cards on the table, I personally have always found the assumption that all Ruby code is loaded from toplevel to be one of Ruby's biggest weaknesses. That's my view, and I'm happy to elaborate on it, but my focus right now will be objectively on whether this toplevel-centric design is inevitable or not.

Please elaborate on this, as I can only think of a handful of languages (most descended from JavaScript) where code is not referenced from the top-level, and they all have the much bigger weakness of being able to load the same code multiple times in multiple contexts such that you cannot be certain whether two related pieces of code are running the same version. Barring some absolute trickery (which I’ve done before) and (to some degree) refinements (which I still haven't used), you can be guaranteed that if you’re calling a method, all calls to that method will be the same method.

Updated by shioyama (Chris Salzberg) over 1 year ago

Just wanted to quickly correct something in my earlier response, my example (with Foo1 and Foo2 etc.) was not quite correct. I've updated it, just wanted to mention because it could cause some confusion.

Will respond to other comments a bit later. I appreciate all the feedback!

Updated by shioyama (Chris Salzberg) over 1 year ago

Just one thing before I respond to other comments:

@austin (Austin Ziegler)

which makes this misfeature less useful

I should probably ignore this, but I can't.

I asked for "constructive feedback", and I very much include in that constructive criticism. I have been upfront and presented a concrete example of what I consider to be the biggest problem with the whole idea, partly because this is the kind of actionable feedback I am looking for, but also to make clear that my intention here is not to gloss over issues but to actually identify and discuss them.

Characterizing a (rough, preliminary) proposal as a "misfeature" does nobody any good, and it certainly doesn't advance your argument. Personally it just makes me really frustrated after the time I've put into exploring and presenting the idea. Whatever you may think of it, I have done my best to present this idea clearly and honestly, so I'd ask that you respond in kind (which, to be clear, the rest of your response does).

Updated by austin (Austin Ziegler) over 1 year ago

shioyama (Chris Salzberg) wrote in #note-17:

Just one thing before I respond to other comments:

@austin (Austin Ziegler)

which makes this misfeature less useful

I should probably ignore this, but I can't.

I asked for "constructive feedback", and I very much include in that constructive criticism. I have been upfront and presented a concrete example of what I consider to be the biggest problem with the whole idea, partly because this is the kind of actionable feedback I am looking for, but also to make clear that my intention here is not to gloss over issues but to actually identify and discuss them.

Characterizing a (rough, preliminary) proposal as a "misfeature" does nobody any good, and it certainly doesn't advance your argument. Personally it just makes me really frustrated after the time I've put into exploring and presenting the idea. Whatever you may think of it, I have done my best to present this idea clearly and honestly, so I'd ask that you respond in kind (which, to be clear, the rest of your response does).

You’re right, I was wrong to characterize it as such as it does minimize the thought and work you have put into this. I’m sorry.

Updated by shioyama (Chris Salzberg) over 1 year ago

You’re right, I was wrong to characterize it as such as it does minimize the thought and work you have put into this. I’m sorry.

Thank you, I really appreciate it.

Updated by shioyama (Chris Salzberg) over 1 year ago

Before I start, I'd like to put aside the problem of transitive require and of compiled native extensions for a moment. These are the most contentious points of this proposal, and I now regret making them so central because they are not actually essential.

Reading the feedback here, I have come to realize that the distinction between "packages" and "imports" is the more important point, so I'm going to focus on that.

In order to understand the goal description "isolate components", it would help me if you could describe one concrete way to use this idea in an application, and that description should cover the implications for unrelated 3rd-party gems.

Let me start by clarifying the word "components" here, because it may not have been the best choice of word on my part.

I see the namespace problem here as one of scaling in two different "spaces" of components:

  1. The space of code living together in a single application
  2. The space of code shared between all applications (gems)

I want to focus on how two concepts, encapsulation and namespacing, relate to scaling challenges in these two spaces. That will motivate the proposal I've presented here.

Encapsulation and namespacing are directly related: Ruby's main mechanism for encapsulation is namespacing. You name something in a file and define what you want under it, and hope nobody reaches into that module namespace when they shouldn't. You have private_constant and that's about it.

The fact that namespacing is the main mechanism to enforce encapsulation is problematic in my opinion because it fundamentally misaligns two very important incentives, one natural and one that we want to create (both in application code and in gem code).

The first thing that is naturally incentivized (by the effort it takes to do it) is to write less code, particularly boilerplate code. It's much easier to write Product than it is to write Merchandising::Product, and much easier to not wrap your gem code in module Merchandising than it is to wrap it. The interpreter may treat these roughly the same way, but humans will see them quite differently and naturally prefer the former over the latter.

The second thing that we want to incentivize is to group related code together. And because naming is encapsulating, grouping requires namespacing: the merchandising concept of "product" should be named Merchandising::Product and not Product. Moreover, as a taxonomy of concepts grows, we need further subdivisions, which means deeper namespacing.

So incentives are in direct opposition: in order to do the right thing, you need to be very conscientious to wrap all your code in the appropriate literal namespace, even though the natural motivation is not to do that. This problem only gets worse as a codebase grows: do we group "External Payment API clients" together under Payments::ApiClients or just under Payments? Grouping code in a natural way requires sacrificing convenience.

This is a terrible tradeoff. The reality is that however much you can try to encourage "doing the right thing", you will always be fighting a losing battle. (I should know, I'm fighting this battle every day!) And this is a battle which I believe is unnecessary, because the literal namespace is mostly redundant; directory structure already serves to signal grouping.

The "packaging" approach, by which I mean what Packwerk does, enforces boundaries with a stick, but it does not fix this profound misalignment. A package keyword that restricts access to a namespace, meanwhile, would actually entrench literal namespaces as the guardian of boundaries, which I think is fundamentally the wrong approach.

I believe your pain point is that Ruby does not have formal namespaces, and we share it.

Yes, but there is a more subtle point that I've so far been unsuccessful at conveying, partly because only in writing this have I come to see it clearly myself.

The points I made above are about literal namespaces, by which I mean namespaces that are literally written into the file. Contrast this with the case of load "foo.rb", mod, where mod acts as namespace in foo.rb but is implicit. In this case, the incentives above can in fact be aligned.

@fxn (Xavier Noria) To get back to your original question, let's assume this is opt-in, and that it does not apply to compiled extensions (gems that want to opt-in would be able to do so however). I think those are the key points that makes this contentious.

So with those out of the way, what I want is that instead of this:

# payments/api_clients/foo_client.rb
require "my_client_gem"

module Payments
  module ApiClients
    class FooClient < MyClientGem::ApiClient
      # ...
    end
  end
end

# payments/api_clients/bar_client.rb
require "my_client_gem"

module Payments
  module ApiClients
    class BarClient < MyClientGem::ApiClient
      # ...
    end
  end
end

# payments.rb
require "payments/api_clients/foo_client"
require "payments/api_clients/bar_client"

module Payments
  # do something with ApiClients::FooClient, ApiClient::BarClient etc
end

we have instead something like this (assuming "my_client_gem" opts in to being "importable", whatever that means):

# payments/api_clients/foo_client.rb
api_client = import "my_client_gem/api_client"

class FooClient < api_client::ApiClient
  # ...
end

# payments/api_clients/bar_client.rb
api_client = import "my_client_gem/api_client"

class BarClient < api_client::ApiClient
  # ...
end

# payments.rb
module Payments
  foo_client = import "./api_clients/foo_client"
  bar_client = import "./api_clients/bar_client"

  # do something with foo_client::FooClient and bar_client::BarClient
end

To me at least, having dealt with reams of namespace boilerplate, I cannot express to you what a pleasure it is just to write this here. It takes away so much that is irrelevant and leaves only what is relevant: what the code is actually doing. This I believe is why this idea has generated so much excitement.

At this point, what I've written above is already implementable with the recent change to load alone. I am not depending on transitivity of require here and the I'm assuming the code in my_client_gem is all Ruby and has no native extension. (Assume here that my_client_gem opts in to make its code "importable", whatever that means -- this is something to work out).

So the misalignment of incentives, as I've presented it, is resolvable in a way. But there's a problem, because while I have "imported" "payments/api_clients/foo_client, that imported code can freely access anything else in the toplevel namespace. So ::Payments in payments/api_clients/foo_client.rb resolves to the toplevel ::Payments.

In other words, the problem that Packwerk solves is still there.

We are actually really close though to what I think is a better solution to that problem. If toplevel in the imported file resolved to the top of the import context, we would actually achieve a kind of "nested encapsulation". A wrapped load context only "sees" as far up as its wrap module. It is essentially a "universe unto itself". The importer side can see down the namespace, but the "importee" cannot see up past its toplevel.

There is no conflict with require here because code that is required always resolves to the absolute toplevel, nothing changes there. Code that is loaded under a wrap namespace cannot see outside its namespace unless its load module has references to that global context. require in such a context would require at toplevel, but that code would not be visible unless the wrapping context had aliased constants under the wrap module.

This means that anytime you want a new toplevel, you can have one. The original "true" toplevel (used by require) is still there as always. This would be something new, so maybe need a new flag or whatever, but the point is that it would not be fundamentally in conflict with require.

I have roughly implemented the isolation mechanism of this idea with cref flags in my Ruby patch (ignore the change to make require transitive). Although there are edge cases to consider (and I can see a couple), I feel this is actually the potential basis for an implementation of "imports" which avoids the fundamental problems raised so far, while offering the key missing element to make "code wrapping" become a much more powerful concept for encapsulation and code organization both in application and in gem code.

I'll stop here because this is already way too long. Happy to elaborate further on points that might be unclear.

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

@shioyama (Chris Salzberg) Thank you for that explaination, I now have a better understanding of the motivation for this proposal.

In terms of loading code, Ruby has two methods, load and require. load can take a wrapping module, require cannot. One reason for this is that load is not designed to be idempotent (it loads the file every time), while require is designed to be idempotent (it does not load the same file more than once). Since load is not designed to be idempotent, it can take a wrapping module, as the behavior can vary per-call. This does not apply to require, because require must be idempotent. Fundamentally, you cannot support require with a wrapping module without losing the idempotency. The following code could not work and be idempotent:

MyModule1 = Module.new
MyModule2 = Module.new
require 'foo', MyModule1
require 'foo', MyModule2

For similar reasons, making require implicitly support the currently wrapping module would break idempotency and therefore I do not think it should be considered.

In terms of purely reducing the amount of namespace boilerplate, you can use load for internal code loading. I think you would still want to use require for files in gems, since you do not control that code (internally, those gems could use a similar approach to this):

# payments/api_clients/foo_client.rb
require "my_client_gem/api_client"

class FooClient < MyClientGem::ApiClient
  # ...
end

# payments/api_clients/bar_client.rb
require "my_client_gem/api_client"

class BarClient < MyClientGem::ApiClient
  # ...
end

# payments.rb
module Payments
  load File.expand_path("api_clients/foo_client.rb", __dir__), self
  load File.expand_path("api_clients/bar_client.rb", __dir__), self

  # do something with Payments::FooClient and Payments::BarClient
end

Note that the wrapping module for load only currently supports a single namespace, not multiple namespaces. Maybe your patch adds that, but I couldn't tell because it doesn't include tests. Note that you can currently support multiple namespaces using eval. The approach seems kind of ugly, but it's basically what you seem to want in terms of implicit nesting:

module Payments
  class Nested
    # Top level constant lookup in the *_client files uses Payments::Nested, Payments, Object
    foo_path = File.expand_path("api_clients/foo_client.rb", __dir__))
    eval File.read(foo_path), binding, foo_path

    bar_path = File.expand_path("api_clients/bar_client.rb", __dir__))
    eval File.read(bar_path), binding, bar_path
  end

  # do something with Payments::FooClient and Payments::BarClient
end

While I understand the goal of reducing namespace "boilerplate", I think it is important to understand that removing explicit namespaces is a tradeoff. If you do not leave the namespaces in the file, but instead let them be implicit, the code likely becomes more difficult to understand. You state that programmers would naturally prefer implicit namespaces over explicit namespaces, but I'm not sure that is true. Implicit code is not necessarily better than explicit code. What you consider "irrelevant" may be very relevant to someone who isn't familiar with the code an all of the implicit namespaces being dealt with.

You describe the current state of affairs as a "terrible tradeoff", but that seems hyperbolic to me. At most, having to use explicit namespaces should be mildly annoying, even if you have full understanding of the code and can deal with implicit namespaces.

In terms of encapsulation, Ruby allows trivial breaking of encapsulation even in code that uses wrapped modules. ::Foo always refers to Object::Foo. You could not use wrapping module support to enforce encapsulation in Ruby.

Note that both the load and eval approaches I've shown are unlikely to work well if you have optional parts of the codebase that you would like to load in different places. In situations like that, you really need the idempotency that require offers, to make sure the related code is only loaded once.

@shioyama (Chris Salzberg) I think it would be helpful if, for each of the patches you are proposing, you include tests to make it easier to see what each patch allows and how the behavior changes. To the extent that the patches are independent, a separate pull request with tests for each would be helpful and aid review. Even though I don't think the current state of load/require is an issue worth fixing, I think each patch could be considered on its own merits.

Updated by shioyama (Chris Salzberg) over 1 year ago

@jeremyevans0 (Jeremy Evans)

Thanks for your thoughtful response!

For similar reasons, making require implicitly support the currently wrapping module would break idempotency and therefore I do not think it should be considered.

I agree, and from the beginning I have not advocated allowing passing extra parameters to require. It seems that everyone here agrees that changing require in almost any way that alters its basic premises is fundamentally a no-go.

Given that, wouldn't it make sense to close #10320, ideally with a note explaining why the proposal there is not feasible? Although similar in spirit to this issue, that one entirely centers on changing require in such a way that, as I read it, it is no longer exclusively idempotent.

I ask because leaving that issue open invites the interpretation (perhaps mistaken) that the proposal there is feasible given the right implementation, whereas as I see it from discussions here it seems almost entirely infeasible under any circumstances.

While I understand the goal of reducing namespace "boilerplate", I think it is important to understand that removing explicit namespaces is a tradeoff. If you do not leave the namespaces in the file, but instead let them be implicit, the code likely becomes more difficult to understand. You state that programmers would naturally prefer implicit namespaces over explicit namespaces, but I'm not sure that is true. Implicit code is not necessarily better than explicit code. What you consider "irrelevant" may be very relevant to someone who isn't familiar with the code an all of the implicit namespaces being dealt with.

I agree that there is a tradeoff, as @fxn (Xavier Noria) earlier commented when he wrote that the idea breaks the fundamental assumption that "by looking at your source code, you know the nesting."

But this is about much more than reducing boilerplate. It is about a fundamental shift in perspective, from one where everything is visible everywhere, to one where the "perspective" is itself something that can be created, nested and isolated.

You write "implicit code is not necessarily better than explicit code". I agree. Autoloading, for example, makes a similar tradeoff of implicit over explicit, and that tradeoff likewise has non-trivial downsides. Autoloading can also be seen as reducing boilerplate (all those requires we no longer need), but clearly it is about more than that.

Ruby has many sharp knives like this, and the way we handle those knives is by creating conventions around their usage. Much the way Zeitwerk (and Rails) provided file organization conventions around autoloading, any mechanism in Ruby that would allow code to be imported in the way I'm describing would also invite some kind of conventions around its usage to make it more useful.

I admit that this is very hand-wavy, and I need to provide a clearer demonstration of what those conventions might look like. This is something that is lacking from this proposal, and something I am thinking a lot about. I will come back to this point.

You describe the current state of affairs as a "terrible tradeoff", but that seems hyperbolic to me. At most, having to use explicit namespaces should be mildly annoying, even if you have full understanding of the code and can deal with implicit namespaces.

I can see how you see that statement as hyperbolic, but I don't see it that way. It is "terrible" in the sense that it is terrible that one needs to make such a tradeoff at all. Its terrible-ness is relative to the size of the code space involved; in a codebase of a few dozen files with few dependencies it is not terrible.

OTOH solving a "mildly annoying" problem is not to me an appropriate characterization of what I am describing. Maybe that's because each time I present one aspect of what I see as a bigger picture change.

Imports as a concept tackles several hard problems at once, including:

  • literal namespaces and code grouping, and the misalignment of incentives involved (as described)
  • encapsulation/isolation (i.e. Packwerk, packages etc.)
  • namespace collisions/conflicts (I want a semantically-meaningful Platform in my application but it conflicts with the platform gem)

The last one here, which I have barely touched on, is a problem we just live with as Rubyists, and to some degree I think simply internalize as "the way things work". But this is a real problem that deserves a proper solution.

Note that you can currently support multiple namespaces using eval.

It's interesting you brought up this example, because I have considered implementations for import using eval at least as a proof-of-concept, but it doesn't work for the very important case where I want to evaluate under an anonymous module namespace; in your example, you need to supply a dedicated named context (Payments::Nested) to load the code into. This may seem like a minor point but I don't believe it is.

Only with an anonymous-rooted namespace can we avoid polluting the parent load context's namespace, and avoid potential conflicts with other loaded constants in that same namespace.

i.e. I want this:

module Payments
  foo_client = Module.new do
    foo_path = File.expand_path("api_clients/foo_client.rb", __dir__))
    eval File.read(foo_path), binding, foo_path
  end

  # do something with foo_client::FooClient
end

but this actually translates to this:

module Payments
  foo_client = Module.new do
    class FooClient < MyClientGem::ApiClient
      # ...
    end
  end
end

This does not define foo_client::FooClient, but rather ::Payments::FooClient, because any call to module or class in the evaluated file will resolve back to the closest named context, in this case Payments.

load with the wrap module is different because it resolves module and class definitions to the wrap module, regardless of whether that module is anonymous. As far as I can tell (please correct me if I am wrong!) there is no other way in Ruby to do this (including "ugly" hacks like eval on file contents). It is this (unintended?) "feature" that I am attepmting to leverage here, to make it more powerful.

I love that this concept of an "unrooted" nested namespace (what I am loosely referring to as an "import" here) is actually something that already exists in Ruby. It does not need to be added, it just needs to be tweaked so as to (optionally) isolate it from its parent.

I think it would be helpful if, for each of the patches you are proposing, you include tests to make it easier to see what each patch allows and how the behavior changes.

Thanks, I will do this. There are not many of these; there may in fact only be one or two.

Just to clarify though: I had originally intended to actually do just this. But after discussion at the Developers Meeting it was recommended that I lay out the bigger picture in a new issue separate from #10320, to motivate individual changes, so that is what I have done. And indeed, I now think that having the big picture is important in understanding the individual changes, so I will link back here to contextualize and motivate each of them, while also arguing for them on their own merits.

Updated by austin (Austin Ziegler) over 1 year ago

@shioyama (Chris Salzberg), thanks for the deeper details. As requested, I'm considering the concept of transitive require off the table. I have elided some of your message in response, but I do not believe that I have misrepresented anything. I am also replying to #note-20 because I do not believe that your subsequent follow-up fundamentally changes my position in any way.

shioyama (Chris Salzberg) wrote in #note-20:

Encapsulation and namespacing are directly related: Ruby's mechanism for encapsulation is namespacing. You name something in a file and define what you want under it, and hope nobody reaches into that module namespace when they shouldn't. You have private_constant and that's about it.

Fundamentally, that's not true. It is by convention that require 'foo' defines ::Foo. It could define :Foo and ::Bar, or it could just define ::Hoge. I think it's a good convention. As we're talking about a new feature, it is worthwhile making it so that filenames are correlated to defined constants…more or less. I also think that it's worth noting that we currently have only private_constant—and it still feels like a new thing to me (although I use it extensively now).

[N]amespacing enforc[ing] encapsulation …fundamentally misaligns two very important incentives…

The first … is to write less code, particularly boilerplate code. It's much easier to write Product than it is to write Merchandising::Product, and much easier to not wrap your gem code in module Merchandising than it is to wrap it. The interpreter may treat these roughly the same way, but humans will see them quite differently and naturally prefer the former over the latter.

I don't, full stop. I'm doing most of my work these days in Elixir, where I have defmodule Company.Resources.ProductVariant for our product variant structure. Yes, I refer to this as ProductVariant in discussions, and will use alias Company.Resources.ProductVariant so that in context the object can be referred to as ProductVariant…but I also have defmodule Company.GQL.Schema.ProductVariant (more or less) or defmodule Company.GQL.Resolver.ProductVariant (again, more or less).

I've got a codebase that I cannot wait to have the opportunity to rewrite from Node.js to Elixir or Ruby, because the lack of any viable namespacing and in-built structure in JavaScript has made the code an absolute disaster to work in.

The second thing that we want to incentivize is to group related code together

This is a terrible tradeoff. The reality is that however much you can try to encourage "doing the right thing", you will always be fighting a losing battle. (I should know, I'm fighting this battle every day!) And this is a battle which I believe is unnecessary, because the literal namespace is mostly redundant; directory structure already serves to signal grouping.

Ruby is not Java, Python, or any other language which chose to have a tight relationship to the filesystem for module specification. I do understand that part of your discussion is whether Ruby should have a filesystem-based module specification.

The points I made above are about literal namespaces, by which I mean namespaces that are literally written into the file. Contrast this with the case of load "foo.rb", mod, where mod acts as namespace in foo.rb but is implicit. In this case, the incentives above can in fact be aligned.

This fundamentally changes everything about Ruby if it were to be adopted, and I disagree that this would be a net positive change.

The reason this works for JavaScript is because there's absolutely no namespacing of any sort, explicit or implicit, in JavaScript. Namespaces are simulated through object properties and closures (and treating module.exports or export … as declaring an object).

To make this work in Ruby, it seems to me that it would be necessary to introduce the concept of exports for Ruby…and I don't think it meaningfully improves readability or maintainability. The only concept that Ruby has for an export is a constant…and even with load 'foo.rb', mod, there's a lot of sharp edges:

[1] pry(main)> .cat "test.rb"
foo = "bar"

Baz = "baz"

def hoge
  "hoge"
end

module Quux
end
private_constant :Quux
[2] pry(main)> q = Module.new; load "test.rb", q
NoMethodError: undefined method `private_constant' for main:Object

private_constant :Quux
^^^^^^^^^^^^^^^^
Did you mean?  private_methods
from test.rb:11:in `<top (required)>'

Even removing private_constant :Quux, there are sharp edges:

[1] pry(main)> q = Module.new; load "test.rb", q
=> true
[2] pry(main)> q.Baz
NoMethodError: undefined method `Baz' for #<Module:0x000000010a37ca70>
from (pry):2:in `__pry__'
[3] pry(main)> q::Baz
=> "baz"
[4] pry(main)> q::Quux
=> #<Module:0x000000010a37ca70>::Quux
[5] pry(main)> q::hoge
NoMethodError: undefined method `hoge' for #<Module:0x000000010a37ca70>
from (pry):5:in `__pry__'
[6] pry(main)> q.hoge
NoMethodError: undefined method `hoge' for #<Module:0x000000010a37ca70>
from (pry):6:in `__pry__'

we have instead something like this (assuming "my_client_gem" opts in to being "importable", whatever that means):

# payments/api_clients/foo_client.rb
api_client = import "my_client_gem/api_client"

class FooClient < api_client::ApiClient
  # ...
end

# payments/api_clients/bar_client.rb
api_client = import "my_client_gem/api_client"

class BarClient < api_client::ApiClient
  # ...
end

# payments.rb
module Payments
  foo_client = import "./api_clients/foo_client"
  bar_client = import "./api_clients/bar_client"

  # do something with foo_client::FooClient and bar_client::BarClient
end

I can't quite tell whether that would load one or two instances of my_client_gem/api_client. Regardless, I don't really see how api_client::ApiClient is an improvement over MyClientGem::ApiClient, even if it's actually My::Client::Gem::ApiClient.

To me at least, having dealt with reams of namespace boilerplate, I cannot express to you what a pleasure it is just to write this here. It takes away so much that is irrelevant and leaves only what is relevant: what the code is actually doing. This I believe is why this idea has generated so much excitement.

What you call "namespace boilerplate", I call "context". I think that the "floating namespaces" this is the single worst thing about JavaScript and Typescript when it comes to complex codebases. As I said earlier, I have one with a dozen or so tables and maybe twice that many endpoints…and I cannot wait to rewrite it in something that is not JavaScript, because there are five different ways that the files have been organized because JavaScript simply does not care about or even support good code organization.

I've recently written a library in Elixir, Ruby, and Typescript…and while there are things that can be said both positive and negative about all three implementations, I prefer working on the Elixir or Ruby ones to the Typescript because there's better support for good practices in both other languages.

We are actually really close though to what I think is a better solution to that problem. If toplevel in the imported file resolved to the top of the import context, we would actually achieve a kind of "nested encapsulation". A wrapped load context only "sees" as far up as its wrap module. It is essentially a "universe unto itself". The importer side can see down the namespace, but the "importee" cannot see up past its toplevel.

I can't count the number of times that this "fact" has proved to be problematic in the JavaScript code that I've written. Having to import the universe to accomplish a task leads to a lot of import boilerplate that I find far more distracting than what you call namespace boilerplate.

From a package access perspective, though, I think that you're trying to solve this the wrong direction. It should not matter whether the imported code can reach out beyond its toplevel, but it should matter that other code should not be able to reach into the imported code except through defined APIs (modulo, of course, __send__).

That's why I think that, without introducing an import concept that requires exports and multiple toplevels‡, we could introduce something that IMO would be (a) more general, (b) more declarative, (c) incremental, and (d) easier to understand and explain. That would be what I'll call a "package declaration". It's very rough, and I don't know that I'll develop it much more than this because it isn't something that I need.

The basic concept is that you'd declare a particular name / namespace to be a package, and then you'd mark sub-namespaces to be package_constants. From within any child of the package namespace, a package_constant would just be a regular constant. From outside of the package namespace, it would be the same as a private_constant.

class Foo
  class Bar
    class Hoge
      def hoge = "hoge"
    end
    private_constant :Hoge

    def hoge = Hoge.new.hoge

    def bar = "bar"
  end
  package_constant :Bar

  class Baz
    def baz = "baz"

    def bar = Foo::Bar.new.bar

    def hoge = Foo::Bar.new.hoge
  end

  def bad = Bar::Hoge.new.hoge

  def good
    [Baz.new.baz, Bar.new.bar, Bar.new.hoge]
  end
end

package :Foo

Foo::Bar # => NameError: package constant Foo::Bar referenced
Foo::Baz # => Foo::Baz
Foo # => Foo

With the concept that I'm talking about, only Foo::Bar could reach Foo::Bar::Hoge (it's a private constant), and only Foo and Foo::Baz could reach Foo::Bar, and both Foo::Bar and Foo would be publicly exposed constants / classes.

‡ I understand that part of your goal is to reduce "namespace boilerplate". I disagree with that goal and doubt that I could be convinced that this would be a net positive change. I wish that I could write JavaScript like it actually had any sort of structure at all provided by and/or enforced by the language intentionally, rather than accidentally.\

Updated by shioyama (Chris Salzberg) over 1 year ago

@austin (Austin Ziegler)

Thanks very much for your response. I'll take some time to digest it, but one thing stood out:

From a package access perspective, though, I think that you're trying to solve this the wrong direction. It should not matter whether the imported code can reach out beyond its toplevel, but it should matter that other code should not be able to reach into the imported code except through defined APIs (modulo, of course, __send__).

Putting aside opinions on what is good/bad, you've summarized the point of disagreement here: whether to draw a line and stop code form reaching into something, or stop code from reaching out of something.

There are a lot of implications to both but I think it helps to identify this core point, aside from arguments for/against, so others have a reference point since this discussion is getting quite long.

Actions #25

Updated by shioyama (Chris Salzberg) over 1 year ago

  • Description updated (diff)

Updated by shioyama (Chris Salzberg) over 1 year ago

there's a lot of sharp edges

I consider the first one a bug, thanks for identifying it. I've filed an issue for it: https://bugs.ruby-lang.org/issues/19067

The others I'm not quite clear on, but if the current wrap parameter is not working as expected, then that's also a bug and should be fixed, regardless of this discussion.

Actions #27

Updated by duerst (Martin Dürst) about 1 year ago

Updated by rubyFeedback (robert heiler) about 1 year ago

Personally I think it may be better to leave require(),
load() and require_relative() unchanged as-is, and instead
add a new, more sophisticated way for ruby to handle loading
of stuff in general. That one should then ideally be as flexible
as possible, and "extensible" for the future. It could then also
allow all of our needs and wants.

Many years ago, for instance, I wanted to be able to load up a
specific module, but instantly, upon load-time, assign it a new
different name.

We can kind of simulate that way e. g.

require 'foobar'
Barfoo = Foobar # and then setting Foobar to nil, I suppose, or
                # something like that

But I wanted that on the require/load situation. We could also
integrate autoload-like behaviour into it. And so on and so forth.

That could also mean to load ruby code "outside" of any "namespace".
Such as "anonymous loading" not polluting the namespace.

I am not sure we should add "import" as such, though. People will
ask when to use include, and extend, and then import.

So I kind of agree with fxn that we should leave require and load
as it is, and instead think about more flexible loading/requiring
of code as-is.

Another feature I wanted to have is that we can assign "authors"
to a namespace - not in a way to deny people from using them,
but simply to be able to track who made modifications where and
when. In some ways this is similar to how refinements can be
thought of - we can think of them as "isolated namespace changes"
but allowing us more control over the code changes (if we ignore
the syntax; I always found the syntax odd for refinements).

Perhaps we should create a more unified proposal eventually that
can unify all the different ideas, pros and cons, for matz to
think about what the best approach would be.

One thing austin wrote:

I don’t see any value in namespacing beyond what Ruby has
through modules and classes. I certainly don’t see any
value in the ability to load more than one version of a
piece of code at a time under a different namespace (this
is, IMO, one of the worst parts of JavaScript).

Well, refinements already does that to some extent; and I
can see the potential value in knowing who made modifications
where and when. Right now we all have one unified namespace.

This can lead to problems sometimes. I agree it is not a huge
problem per se, but when I write:

class Colors

And some other package is:

module Colors

and I already included that, then there are some problems.

(Or, both use class, or module, and then I may overwrite
some toplevel method or a similar problem.)

It's not a huge problem per se, mind you, but in these
cases, being able to "tap" into a class or module and
see where changes were made, when, by whom, and in which
package/file, can be useful in my opinion. That is not to
say I agree with the proposal here per se, but I wanted to
comment on whether there may be use cases - and I think there
are.

I don't know of a good syntax-way to tap into any of that
though. I just think we should be open about different ways
how to load up code in ruby, because there are definitely use
cases that are not fully covered by require and load. (I use
require() about 98% of the time, and I avoid autoload, because
I found that the cognitive load of having to remember how to use
it offsets the benefits it brings; I also don't need require_relative.
I do sometimes need to use load(), e. g. dynamic reloading of code
if you need it, but I can see different use cases most assuredly.)

Updated by shioyama (Chris Salzberg) about 1 year ago

I wanted to update this because I've changed my thinking since the original proposal.

TL;DR

  1. I agree that we should not change require, require_relative, load or autoload (at least, not in ways that would break existing usage). Thanks @jeremyevans0 (Jeremy Evans) and others for convincing me of this.
  2. Any new way to import code should be opt-in. Again, many voiced this opinion here and it makes sense to me now.

I decided to take these two as constraints and see what else was possible in Ruby 3.2, and came up with (a new version of) Im. Im is an "isolated module autoloader", a fork of Zeitwerk which autoloads constants under an anonymous namespace.

The interface for Im is nearly identical to Zeitwerk except that rather than loading to top-level, constants are loaded under loaders themselves, where Im::Loader subclasses Module and can therefore define its own namespace.

So for gem code, it looks like this:

# lib/my_gem.rb (main file)

require "im"
loader = Im::Loader.for_gem
loader.setup # ready!

module loader::MyGem
  # ...
end

loader.eager_load # optionally

Notice here that loader encapsulates the entire loaded namespace. Further details in the readme. I've also built a sample Rails app which uses Im to load all its code under a single application namespace. Internally Im uses load with the second module argument (discussed here) and also Module#const_added (ref), also added in Ruby 3.2.

The advantages to this approach:

  • autoloading does not require "returning" anything, unlike require, where any change would face the problem of where to "receive" the thing you've loaded if not at toplevel. For the most part you can just write your code exactly as you would any other Zeitwerk-autoloaded code, without any import calls in each file, and for a gem you just import the gem once and get the whole tree of autoloaded code.
  • the Zeitwerk convention for file naming/loading (inherited from Rails) is now widely adopted, and so the changes to make a gem "Im-compatible" should generally be quite small. (The exception here is Rails, which depends heavily on Module#name to map association names, etc.)
  • Although the approach does not guarantee isolation (e.g. you can always "break out" by referencing toplevel with ::Foo), it can guarantee a kind of "opt-in" isolation, whereby _within your autoloaded code you own the toplevel (because your "toplevel" is the top of an anonymous-rooted module namespace). Similarly, a gem can entirely remove itself from the global namespace, instead allowing the gem consumer to determine the top constant name under which to load code. So two gems that follow the convention are isolated from each other provided they don't create/modify anything at the "absolute toplevel".

@rubyFeedback (robert heiler)

This can lead to problems sometimes. I agree it is not a huge
problem per se

I disagree here, I think this is actually a huge issue, and I think it's only because we as Rubyists are so used to it that we treat it as a "minor" inconvenience. It's fundamentally a scaling issue both in the code ecosystem space (rubygems) and in the application space.

As you noted, it's not just a problem that "my constant collides with your constant with the same name". It's that every pair of collaborators in an application (every pair of gems in the Gemfile, plus every contributor to the application itself) have to follow a contract that says nobody will modify the same namespace in "unexpected" ways. When you start scaling things up, to an application with thousands of contributors with hundreds of gems, this becomes problematic at best.

I think Im is a potential solution to this problem. Moreover, allaying I think some of the concerns expressed here, a gem can opt to offer two "endpoints", one for Zeitwerk and one for Im, such that the gem consumer can decide how to "consume" the gem code (either at toplevel or under an anonymous-rooted namespace). So if you like your universe always pointing to the same toplevel, it would be possible to keep that, whereas others who want to "relativize" the toplevel would also be able to do that.

In any case, I don't really feel the need for further changes to Ruby other than any supporting the existing functionality in 3.2. I'm happy if this is closed, unless others want to keep it open.

Actions #30

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed
Actions #31

Updated by hsbt (Hiroshi SHIBATA) 9 months ago

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0