Project

General

Profile

Actions

Feature #21311

open

Namespace on read (revised)

Added by tagomoris (Satoshi Tagomori) about 14 hours ago. Updated 22 minutes ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:121837]

Description

This replaces #19744

Concept

This proposes a new feature to define virtual top-level namespaces in Ruby. Those namespaces can require/load libraries (either .rb or native extension) separately from other namespaces. Dependencies of required/loaded libraries are also required/loaded in the namespace.

This feature will be disabled by default at first, and will be enabled by an env variable RUBY_NAMESPACE=1 as an experimental feature.
(It could be enabled by default in the future possibly.)

"on read" approach

The "on write" approach here is the design to define namespaces on the loaded side. For example, Java packages are defined in the .java files and it is required to separate namespaces from each other. It can be implemented very easily, but it requires all libraries to be updated with the package declaration. (In my opinion, it's almost impossible in the Ruby ecosystem.)

The "on read" approach is to create namespaces and then require/load applications and libraries in them. Programmers can control namespace separation at the "read" time. So, we can introduce the namespace separation incrementally.

Motivation

The "namespace on read" can solve the 2 problems below, and can make a path to solve another problem:

  • Avoiding name conflicts between libraries
    • Applications can require two different libraries safely which use the same module name.
  • Avoiding unexpected globally shared modules/objects
    • Applications can make an independent/unshared module instance.
  • Multiple versions of gems can be required
    • Application developers will have fewer version conflicts between gem dependencies if rubygems/bundler will support the namespace on read. (Support from RubyGems/Bundler and/or other packaging systems will be needed)

For the motivation details, see [Feature #19744].

How we can use Namespace

# app1.rb
PORT = 2048
class App
  def self.port = ::PORT
  def val = PORT.to_s
end

p App.port # 2048

# app2.rb
class Number
  def double = self * 2
end

PORT = 2048.double
class App
  def self.port = ::PORT
  def val = PORT.double.to_s
end

p App.port # 4096

# main.rb - executed as `ruby main.rb`
ns1 = Namespace.new
ns1.require('./app1') # 2048
ns2 = Namespace.new
ns2.require('./app2') # 4096

PORT = 8080
class App
  def self.port = ::PORT
  def val = PORT.to_s
end

p App.port # 8080
p App.new.val # "8080"

p ns1::App.port # 2048
p ns1::App.new.val # "2048"

p ns2::App.port # 4096
p ns2::App.new.val # "8192"

1.double # NoMethodError

Namespace specification

Types of namespaces

There are two namespace types, "root" and "user" namespace. "Root" namespace exists solely in a Ruby process, and "user" namespaces can be created as many as Ruby programmers want.

Root namespace

Root namespace is a unique namespace to be defined when a Ruby process starts. It only contains built-in classes/modules/constants, which are available without any require calls, including RubyGems itself (when --disable-gems is not specified).

At here, "builtin" classes/modules are classes/modules accessible when users' script evaluation starts, without any require/load calls.

User namespace

User namespace is a namespace to run users' Ruby scripts. The "main" namespace is the namespace to run the user's .rb script specified by the ruby command-line argument. Other user namespaces ("optional" namespaces) can be created by Namespace.new call.

In user namespace (both main and optional namespaces), built-in class/module definitions are copied from the root namespace, and other new classes/modules are defined in the namespace, separately from other (root/user) namespaces.
The newly defined classes/modules are top-level classes/modules in the main namespace like App, but in optional namespaces, classes/modules are defined under the namespace (subclass of Module), like ns::App.

In that namespace ns, ns::App is accessible as App (or ::App). There is no way to access App in the main namespace from the code in the different namespace ns.

Constants, class variables and global variables

Constants, Class variables of built-in classes and global variables are also separated by namespace. Values set to class/global variables in a namespace are invisible in other namespaces.

Methods and procs

Methods defined in a namespace run with the defined namespace, even when called from other namespaces.
Procs created in a namespace run with the defined namespace too.

Dynamic link libraries

Dynamic link libraries (typically .so files) are also loaded in namespaces as well as .rb files.

Open class (Changes on built-in classes)

In user namespaces, built-in class definitions can be modified. But those operations are processed as copy-on-write of class definition from the root namespace, and the changed definitions are visible only in the (user) namespace.

Definitions in the root namespace are not modifiable from other namespaces. Methods defined in the root namespace run only with root-namespace definitions.

Enabling Namespace

Specify RUBY_NAMESPACE=1 environment variable when starting Ruby processes. 1 is the only valid value here.

Namespace feature can be enabled only when Ruby processes start. Setting RUBY_NAMESPACE=1 after starting Ruby scripts performs nothing.

Pull-request

https://github.com/ruby/ruby/pull/13226


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #19744: Namespace on readClosedActions

Updated by baweaver (Brandon Weaver) about 14 hours ago

As a proof of concept this is a very valuable idea, and will give users a chance to experiment with it.

I wonder about the long-term ergonomics of this though, and if it may make sense to potentially introduce in Ruby 4 a new keyword for namespace that is stronger than module for wrapping:

namespace NamespaceOne
  require "./app1"
end

namespace NamespaceTwo
  require "./app2"
end

p NamespaceOne::App.port # 2048
p NamespaceOne::App.val # "2048"

p NamespaceTwo::App.port # 4096
p NamespaceTwo::App.val # "8192"

A require that is run inside of a namespace could serve the same function mentioned above, but could additionally provide an isolate environment for defining other code:

namespace Payrolls
  class Calculator; end
  private class RunTaxes; end
end

Payrolls::Calculator # can access
Payrolls:RunTaxes # raises violation error

namespace Payments
  class RecordTransaction; end
end

For Ruby 3.x I would agree that the proposed syntax is good for experimentation, but would ask that we consider making this a top-level concept in Ruby 4.x with a namespace keyword to fully isolate wrapped state.

Updated by fxn (Xavier Noria) about 13 hours ago · Edited

A few quick questions:

Assuming a normal execution context, nesting at the top level of a file is empty. Would it be also empty if the file is loaded under a namespace?

The description mentions classes and modules, which is kind of intuitive. They are relevant because they are the containers of constants. But, as we know, constants can store anything besides class and module objects. In particular, constants from the root namespace, recursively, can store any kind of object that internally can refer to any other object. There is a graph of pointers.

So, when a namespace is created, do we have to think that the entire object tree is deep cloned? (Maybe with CoW, but conceptually?) For example, let's imagine C::X is a string in the root namespace, and we create ns. Would ns::C::X.clear clear the string in both namespaces?

Global variables stay global I guess?

Updated by tagomoris (Satoshi Tagomori) about 12 hours ago

@baweaver I don't have strong opinion about adding namespace keyword, but having a block parameter on Namespace.new could provide similar UX without changing syntax.

NamespaceOne = Namespace.new do
  require "./app1"
end
p NamespaceOne::App.port #=> 2048

This looks a less smart but may not worst. Having Kernel#namespace could be an alternative idea.

NamespaceOne = namespace do
  require "./app1"
end

Updated by tagomoris (Satoshi Tagomori) about 11 hours ago

fxn (Xavier Noria) wrote in #note-2:

A few quick questions:

Assuming a normal execution context, nesting at the top level of a file is empty. Would it be also empty if the file is loaded under a namespace?

Yes. At that time, self will be a cloned (different) object from main in optional namespaces.

So, when a namespace is created, do we have to think that the entire object tree is deep cloned? (Maybe with CoW, but conceptually?)

Conceptually, yes. Definitions are deeply cloned. But objects (stored on constants, etc) will not be cloned (See below).

For example, let's imagine C::X is a string in the root namespace, and we create ns. Would ns::C::X.clear clear the string in both namespaces?

Yes. (I hope built-in classes/modules don't have such mutable objects, but those should have :-( )

Global variables stay global I guess?

Global variables are also separated by namespace. Imagine $LOAD_PATH and $LOADED_FEATURES that have different sets of load paths and actually loaded file paths, which should be different from each other namespace.
Providing protection for unexpected changes of global variables by libraries or other apps is a part of namespace concept.

Actions #5

Updated by tagomoris (Satoshi Tagomori) about 11 hours ago

  • Description updated (diff)

Updated by fxn (Xavier Noria) about 10 hours ago · Edited

Thanks @tagomoris.

Conceptually, yes. Definitions are deeply cloned. But objects (stored on constants, etc) will not be cloned (See below).

Let me understand this one better.

In Ruby, objects are stored in constants. Conceptually, a constant X storing a string object and a constant C storing a class object are not fundamentally different. Do you mean namespace creation traverses constant trees, clones only the values that are class and module objects, and keeps the rest of object references, which become shared between namespaces?

Even in the case of classes and modules, what happens to the objects in their ivars?

I do not know about builtin, but in the case of user-defined classes/modules, I don't think we can assume they do not mutate their state. We could have 2500 of them in the root namespace when the namespace is created.

Updated by tagomoris (Satoshi Tagomori) about 9 hours ago

fxn (Xavier Noria) wrote in #note-6:

In Ruby, objects are stored in constants. Conceptually, a constant X storing a string object and a constant C storing a class object are not fundamentally different. Do you mean namespace creation traverses constant trees, clones only the values that are class and module objects, and keeps the rest of object references, which become shared between namespaces?

For example, String is a built-in class and Class object value, stored as ::String constant. And in a namespace ns1, we can change String definition (for example, adding a constant String::X = "x").
But even in that case, the value of String is identical. ::String == ns1::String returns true.

That means, the value (VALUE in CRuby world) is identical and not copied when namespaces are created, but the backed class definition (struct rb_classext_t) are different and those are the CoW target.

Even in the case of classes and modules, what happens to the objects in their ivars?

Class ivars (instance variable tables of classes) are copied, but the ivar values are not copied. It's similar to constants (constant tables) of classes.

I do not know about builtin, but in the case of user-defined classes/modules, I don't think we can assume they do not mutate their state. We could have 2500 of them in the root namespace when the namespace is created.

In the namespace context, "builtin classes/modules" are classes and modules defined before any user-script evaluation. (I'll update the ticket description soon.)
The total number of those are, classes 685, modules 40 (and internal iclass 51). any user-defined classes/modules are not defined in the root namespace.

Actions #8

Updated by tagomoris (Satoshi Tagomori) about 9 hours ago

  • Description updated (diff)

Updated by fxn (Xavier Noria) about 9 hours ago

any user-defined classes/modules are not defined in the root namespace.

Ah, that is key.

So, what happens in this script?

# main.rb

App = Class.new

ns1 = Namespace.new
ns1.require("./app1") # defines/reopens App

do App and ns1::App have the same object ID?

Or does the feature assume that if you want to isolate things that has to be the first thing before creating any constant, global variable, etc.?

Updated by byroot (Jean Boussier) about 9 hours ago

having a block parameter on Namespace.new could provide similar UX without changing syntax.

That wouldn't handle constant definitions correctly though. Similar to how people get tricked by Struct.new do today.

Foo = Struct.new(:bar) do
  BAZ = 1 # This is Object::BAZ
end

That's why I filed [Feature #20993], it would allow you to do:

module MyNamespace = Namespace.new
  BAZ = 1 # This is MyNamespace::BAZ
end

Updated by Eregon (Benoit Daloze) about 8 hours ago

@fxn The main and user namespaces are independent, though the main namespace can refer to user namespace via ns::SomeConstant.
So the App from main here is inaccessible in ns1, in fact all constants defined in the main namespace are inaccessible in user namespaces, see the end of https://bugs.ruby-lang.org/issues/21311#User-namespace.

Updated by Eregon (Benoit Daloze) about 8 hours ago

I think this addresses https://bugs.ruby-lang.org/issues/19744#note-74 by having a CoW copy of all builtin classes/modules in each namespace (including main namespace), nice.
From a quick read it sounds correct to me.
The semantics might be somewhat surprising in practice:

  • e.g. String#start_with? is available in all namespaces but String#to_time is only available in the namespaces that load activesupport (clear if you know which methods are core but as we have seen from polls it is not always clear)
  • core classes&modules are copy-on-write and shared references, but user-defined classes&modules are completely separate (except main namespace can reference anything from other namespace explicitly through ns::Foo and even store them), it's kind of a dual situation and a bit inconsistent. I think it's necessary semantically though, as a String from another namespace should still be obj.is_a?(String).
  • Any user-defined class instance won't be is_a? in another namespace and this might be particularly confusing for stdlib/default gems/bundled gems, e.g. a Date or Pathname created in ns1 won't be is_a?(Date) in main, e.g. ns1::TODAY.is_a?(Date) # => false or ns1::Date.today # => false. Also Pathname('/') == ns1::Pathname('/') # => false. (all these examples run in the main namespace)

For the last point I suspect one might need a way to transition objects from a namespace to another somehow, which sounds hard.
Unless they truly need to communication at all between namespaces, but then different processes (or multiple interpreters in a process) might be a better trade-off (notably can run in parallel and stronger isolation).

Updated by byroot (Jean Boussier) about 8 hours ago

While I believe namespaces would be a good addition to Ruby, I'm not convinced this particular implementation of
namespaces is what Ruby needs.

First, I'm not convinced by the motivations:

Avoiding name conflicts between libraries: Applications can require two different libraries safely which use the same module name.

Is this a problem that happens on a regular basis? I believe Ruby has a pretty well established convention
for libraries to expose a single module with a name that correspond to their gem name.

Actual top level module name clashes are extremely rare in my experience.

Avoiding unexpected globally shared modules/objects

Here again, from my experience this is very rare, and usually accepted as a bug, and promptly fixed.

Do we have concrete cases of this being a peristent problem?

Multiple versions of gems can be required

I remember there was discussions about this in the past. Personally this is a feature it's quite strongly
against because it's extremely hard to reason about.

If you have library A using the gem G in version 1, and library B using the gem G in version 2,
and end up with A being passed a G-v2 object, you may end up in a world of hurt.

I understand this feature would be useful for bundler specifically to allow them to use gems internally
without conflicting with the application (a problem they currently solve by vendoring), but outside
of that I'm not convinced it's a desirable feature.

I get that it can happen that you end up in a sticky situation with two dependencies being essentially
incompatible because they require conflicting versions of another dependency, as it happened with the Faraday 2
transition a few years back, but I'm not convinced that working around the problem that way is a net positive.

Namespace monkey patches

This one isn't in your ticket, but from previous public talks I understand it is one?

Here again I'd like to question how big of a problem monkey patches really are.
It is true that 15 years ago, numerous popular gems would irresponsibly monkey patch core classes,
but I believe these days are long gone. Except for ActiveSupport (that gets a pass for being a framework)
very few gems ship with monkey patch.

A notable exception being "protocol" type of methods, such as to_json, to_yaml, to_msgpack, etc.

In addition, I routinely use monkey patches to backport a fix onto a gem while waiting for a fix to be merged
and published upstream. If monkey patches became scoped to namespaces, this would make this sort of "monkey patches"
way harder. So to me it's net negative.

Being able to namespace existing code

Again not listed in your motivations, but you explain pretty well that you want to be able to load arbitrary code
into a namespace, because you don't want to have to modify the existing libraries.

It makes sense, but is it really that big of a need? I personally see namespaces as a feature libraries can
use to write more robust and isolated code. Not as a feature applications can use to workaround libraries.

Other issues

Deduplication

Assuming this implementation of namespaces become largely used, it means some versions of some libraries would
be loaded dozens and dozens of time in the same process. IIRC in some previous public talks you mentioned
the possibility of deduplication, what's the status on this? Because without it, it's a big concern to me.

With Python/Java/Node namespacing systems it's an easily solved problem, because the file is essentially a
namespace objects, so you can just keep a map of file -> namespace_object, but here it seems way more involved.

What I think would be a positive

In order to not just be negative, I'll try to explain what I think would be helpful.

Local namespace

A common complaint I hear from less experienced / occasional Ruby users is they are having trouble figuring out where constants are comming from,
because of the single global namespace.
They prefer the Java/Python/Node style, where each file is more or less its own namespace, and at the top
of the file you list your imports.

I think translated in Ruby, it could be emulated by only allowing to reference constants from outside the namespace
in a fully qualified way:

class SomeClass
end

namespace MyLibrary
  p SomeClass # NameError

  SomeClass = ::SomeClass # This is basically an import

  p SomeClass # works
end

In other word, I think namespaces could be somewhat similar to BasicObject but for modules.

Overly public constants

Another common issue I witnessed is publicly exposed constants, that aren't meant to be public.

Being involved in a really big application, what people are trying to do to make that codebase more manageable
is to break it down in smaller components with the hope that a developer can more easily wrap their head around
a single component, that a component can be tested individually, etc.

This often fall appart because all constants are public by default, so other teams end up relying on APIs that
weren't meant to be used.

I think it would be helpful if namespaces constants were private by default and you had to explictly "export" (publicize)
them.

Updated by Eregon (Benoit Daloze) about 7 hours ago

(from description)

There is no way to access App in the main namespace from the code in the different namespace ns.

Right, although of course the main namespace can do ns1::MainApp = App and expose its class like that.

I wonder if there should be a way to get the Namespace object of the main namespace.
Then the main and user namespaces wouldn't have any difference besides the main namespace beind the default/starting namespace, as if the main script was executed under main_ns = Namespace.new; main_ns.require(main_script).

Updated by Eregon (Benoit Daloze) about 7 hours ago

Has the performance of Namespace been evaluated?
I would assume getting the current namespace to execute methods/procs is an overhead (the namespace is at least needed for constant accesses and for method lookup on builtin classes).

At least any shared code (so probably only methods/procs from the root namespace) doing constant lookup is likely slower as it needs to lookup from the current namespace / from the correct struct rb_classext_t.
Are constant inline caches disabled for such methods, if not how does it work and avoid invalidating those caches?

Same for method lookup on builtin classes, what about method lookup inline caches for method calls inside root namespace methods/procs?

Updated by Eregon (Benoit Daloze) about 7 hours ago

@byroot (Jean Boussier) makes a good point about use cases, I share the same concerns (and already did in https://bugs.ruby-lang.org/issues/19744#note-21 a while ago).
It seems easy to avoid these problems and these problems don't seem to come up frequently either.
I'm not sure adding such a big feature for these seemingly rather-niche issues is worth it.

From TruffleRuby's POV I am unsure it makes to implement Namespace there, when there is stronger isolation already available, more performant and with simpler semantics.

Actions #17

Updated by Eregon (Benoit Daloze) about 7 hours ago

Updated by fxn (Xavier Noria) about 4 hours ago

The main and user namespaces are independent

@Eregon (Benoit Daloze) the description says

User namespace is a namespace to run users' Ruby scripts. The "main" namespace is the namespace to run the user's .rb script specified by the ruby command-line argument. Other user namespaces ("optional" namespaces) can be created by Namespace.new call.

The vocabulary is not very clear to me. What is a "script", is active_record.rb a script? If "User namespace is a namespace to run users' Ruby scripts" and also "The "main" namespace is the namespace to run the user's .rb script specified by the ruby command-line argument.", is main a user namespace?

I find the description also a bit hard to follow at times because it conflates constants and the objects they store. Classes and modules are value objects like any other value object. For example, they are not top-level or not top-level, constants that belong to Object are top-level. In some cases I can translate to what seems to be the intention, but to describe a feature like this I think it would be more clear that we are technically sharp.

I also see the concerns raised by @byroot (Jean Boussier).

Updated by Dan0042 (Daniel DeLorme) about 4 hours ago

byroot (Jean Boussier) wrote in #note-13:

I personally see namespaces as a feature libraries can use to write more robust and isolated code. Not as a feature applications can use to workaround libraries.

It's not about "working around" libraries. It's about loading entirely different and independent apps within the same process. All the motivations presented above are in service of that. Imagine a basic router that accepts requests and routes them to 2 namespaced apps A and B. A is a Rails 6 app, B is a Rails 7 app. Completely different worlds that do not ever interact so "end up with A being passed a G-v2 object" is not an issue in practice.

The 2 apps could be run as separate processes, but

  1. The routing would have to be handled by something else (nginx?) and much less flexible than ruby
  2. Concurrency is harder to control; if you want a limit of 10 concurrent requests, then you need 10 processes each of A and B, plus some external synchronization to ensure that of those 20 processes only 10 are ever active at one time.

I'm not sure if this can be achieved with TruffleRuby sub-interpreters or how you would go about it.

Updated by fxn (Xavier Noria) about 3 hours ago · Edited

My interpretation of the vocabulary is as following:

  1. The interpreter boots in a root namespace.
  2. The "state" when this process ends is somehow snapshotted into S or something, conceptually (plenty of details here, but I am talking only about the vocabulary).
  3. Any other namespace is a user namespace. This is a concept, not a name.
  4. When the interpreter starts interpreting "external" code, it creates a user namespace called main, and initializes it with S.
  5. If the code in main spawns other namespaces, they are also user namespaces, and are initialized with S (for example, the constants and global variables in main are not inherited. In particular, the arrays stored in $LOADED_FEATURES, $LOAD_PATH, etc. are new array references whose initial items are as in root).
  6. Such optional namespaces can in turn spawn namespaces, and they are initialized with S too.

So, I guess spawning does not create hierarchy. A namespace may not have a pointer to the namespace where it was born (or I don't see the need for it after creation). It is conceptually just one flat layer of independent user namespaces.

Is that a good model?

@tagomoris is that correct?

Updated by fxn (Xavier Noria) about 3 hours ago

If my interpretation is correct, the entry point of a library will be executed as many times as user namespaces require it, in the same operating system process.

That is a potential gotcha to have in mind too, since nowadays you can kind of assume that you'll be loaded at most once in the same process, and you may leverage that to run one-offs. With namespaces, you'd need to be idempotent by hand.

Updated by byroot (Jean Boussier) about 1 hour ago

Dan0042 (Daniel DeLorme) wrote in #note-19:

It's not about "working around" libraries. It's about loading entirely different and independent apps within the same process.

So that's not something I understood for the issue description. But if that's so, while I understand the appeal and see how it would be nice to have, I don't think it's useful enough to justify such a massive impact on the VM implementation. But that's just my opinion.

Updated by peter.boling (Peter Boling) 44 minutes ago

byroot (Jean Boussier) wrote in #note-13:

Avoiding name conflicts between libraries: Applications can require two different libraries safely which use the same module name.

Is this a problem that happens on a regular basis? I believe Ruby has a pretty well established convention
for libraries to expose a single module with a name that correspond to their gem name.

One use case for this is to benchmark libraries that do the same thing against each other. Very frequently libraries doing the same thing are forks of each other, and just as frequently are not re-namespaced.

As a concrete example, the memo_wise gem does this type of benchmarking, against all the other known "memoization" gems, and many of those share namespaces.

I wrote a tool in my gem gem_bench to assist them in that benchmarking. It loads a gem, and re-namespaces it, in a terrible, dirty, hacky way. It only works due to the relative simplicity of the libraries, and would not work in more complex cases. And that is the point I'm making here...

It isn't done much because it is damn hard to do. But it might be done more if we could easily load a bunch of otherwise conflicting tools and run them all at the same time. The benchmarking memo_wise has created from this is pretty cool. There are currently multiple open related pull requests, and we're actively working on it.

Updated by byroot (Jean Boussier) 22 minutes ago

peter.boling (Peter Boling) wrote in #note-23:

One use case for this is to benchmark libraries that do the same thing against each other. Very frequently libraries doing the same thing are forks of each other, and just as frequently are not re-namespaced.

I know it's going to read like I'm moving the goal post, but "very frequently" seem way too strong of a term here. This is rather rare.

But yes, benchmarking is indeed one use case. I myself often benchmark different versions of a single gem, for optimization purposes and this feature could be handy, but it's not like I can't do it today, and also benchmark-ips has the hold! and save! feature for exactly that purpose.

So I still think this is way too situational of a use case to justify such a massive change in the VM.

Because to me, in the end it's really about usefulness vs added implementation complexity.

Given the complexity of the implementation, I think it's very hard to justify unless it's envisioned as something that will be very largely adopted. You don't add 6k lines of C in the VM, and many extra indirections in performance sensitive codepaths just to make benchmarking a bit easier once in a while.

But perhaps this feature as designed would end up very useful and very used, I might just not see it yet. But I'd like to hear about common green path, day to day, use cases, not fringe needs.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like2Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0