Better mechanisms to safely load classes concurrently
Today I had an issue reported under JRuby where a user was doing require "some library" unless defined?(SomeClassLibraryDefines). They were running into cases where threads hitting this logic concurrently might see a partially-initialized.
This pattern is not uncommon, and it is broken under all Ruby implementations. I believe this is a major flaw in the way Ruby makes classes visible, and we need to think about changes to how constants are defined during class init or come up with better options for concurrent loading. This bug offers a few ideas and experiments I've tried in hopes we can find something that will work.
The most basic problem is that the constant pointing at the class is set immediately upon opening the class. So the first option is:
- Do not define the constant until leaving the class/module body.
This option would prevent concurrent threads from seeing an uninitialized class. In essence, this code:
def method1 ...
def method2 ...
would operated identically to this code:
Foo = Class.new do
def method1 ...
def method2 ...
A defined? check on Foo would return false until the class was completely initialized, and lazily require + defined would work correctly.
However, there are pieces of code out there that depend on being able to access the constant from within the class/module body. This brings me to a second option I experimented with on JRuby:
- Make the constant visible only to the defining thread until leaving the class/module body.
In JRuby, I accomplished this by using Clojure's STM to implement the constant table. The beginning of opening a new class would start an STM transaction. The new, empty class body would be written into the constant table as part of the transaction, only visible to the current thread. Once the class body closed, the transaction would be committed and all code would see a complete class at that constant.
This version solves most issues with #1. The constant is available to that thread, so if code loading or class definition depends on the constant being available, it will still work. It prevents other threads from seeing a partially-initialized class. And it does not interrupt the normal flow of the program.
There are down sides, though, that may or may not happen in practice. If two threads try to start defining a class and it has never been defined before, only one will win. If code loading or class definition logic within the class/module spin up other threads, they won't be able to see the constant.
So then, perhaps we simply need a better mechanism to know if a given class is "complete" or not?
- Add an attribute to Module indicating whether the class is "open" somewhere in the system.
This would change the require check above to look more like this:
require 'some library' if !defined?(SomeLibrary) || SomeLibrary.open?
The logic here says "if SomeLibrary is not defined or is still being defined, attempt to do the require, which should block until done loading".
But this is somewhat ugly, so I have a fourth suggestion:
- Add a new mechanism for concurrent requires that understands the services those requires provide.
This would be something like:
require_service(:some_service_name, "some library")
This would behind the scenes perform checks for whether some_service_name was completely loaded as a service, and only define mark that service as "loaded" once the require had completed. This fixes all issues with all scenarios above. However, this basically ends up being:
require "some library" unless $".include?("some library")
...but with a user-provided name rather than the filename going into the equivalent of LOADED_FEATURES.
So...these are some brainstorming ideas. What do you think, Ruby world?