Feature #22119
openThread: inherit storage on child threads
Description
Motivation¶
At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the Thread#thread_variable_get/set family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like:
def new_thread do
logging_context = Logging.current_context.deep_dup
database_context = DB.current_context.deep_dup
# etc, etc
Thread.new do
Logging.set_context(logging_context)
DB.set_context/database_context)
yield
end
end
alongside a plethora of checks and rubocops for forbid direct usage of Thread.new. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of Thread.new without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context.
In order to fix that, we've resorted to monkeypatching Thread.new (and Fiber.new), which feels necessary for this use case, but not right. This seems like a feature missing from ruby.
Research¶
ruby does have prior art for "inheritable" storage: Fiber#storage; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless Fiber.new(storage: nil)). This is exactly what's missing here.
Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's ThreadLocalVar, which seems to be inspired in Java's own InheritableThreadLocal.
Proposal¶
ruby should have an OTTB solution for inheritable thread storage. Some options to consider:
1: :storage kwarg¶
An option would be to align with Fiber.new and support a storage kwarg:
Thread.current.thread_variable_set(:foo, "bar")
Thread.new(storage: nil) do # this has to be default
Thread.current.thread_variable_get(:foo) #=> nil
end
Thread.new(storage: Thread.current.storage) do # would be a new method
Thread.current.thread_variable_get(:foo) #=> nil
end
as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. RUBY_THREAD_STORAGE=inherit or something of the kind (which ruby historically hasn't supported).
2. Thread::Variable¶
A new primitive, Thread::Variable, could be integrated, that would have the same semantics as concurrent-ruby's ThreadLocalVar:
VAL = Thread::Variable.new(2)
Thread.new do
puts VAL.value #=> 2
VAL.value = 3
puts VAL.value #=> 3
end.join
puts VAL.value #=> 2
I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness.
Updated by nobu (Nobuyoshi Nakada) 2 days ago
chucke (Tiago Cardoso) wrote:
1: :storage kwarg¶
An option would be to align with
Fiber.newand support astoragekwarg:as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e.
RUBY_THREAD_STORAGE=inheritor something of the kind (which ruby historically hasn't supported).
I don't think this feature itself bad, but I'm afraid that changing the behavior by an environment variable could be too intrusive and might break other libraries.
chucke (Tiago Cardoso) wrote:
2. Thread::Variable¶
A new primitive,
Thread::Variable, could be integrated, that would have the same semantics as concurrent-ruby'sThreadLocalVar:
I seem to recall there being a similar proposal before, but I don't remember exactly.
Updated by nobu (Nobuyoshi Nakada) 2 days ago
Fiber-local storages are inherited across Threads.
Wouldn't this be sufficient for your use case?
Fiber[:a] = 10
Thread.new { Fiber[:a] }.value #=> 10
Updated by jhawthorn (John Hawthorn) 2 days ago
It was also initially a surprise to me that Fiber storage is inherited across threads as well.
In my experience, inheritable Fiber storage has mostly been a mistake and cause of bugs, I don't think we should spread that to threads.
New threads/fibers spawned implicitly inherit Fiber storage in the state it was in at that time. If these Threads/Fibers are long lived or spawn additional Threads/Fibers, they will retain whatever values were set in that storage at the time for their lifetime (which for ex. a thread pool is the life of the process)
Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such. Anything mutable in Fiber storage should require a Mutex guard, and whatever else one would otherwise have used for mutable, shared, global state.
I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context) instead of pushing this into the language (which would also be problematic, but at least contained and not incorrect by default).
Updated by chucke (Tiago Cardoso) 2 days ago
Fiber-local storages are inherited across Threads.
Wow, that was really unexpected, I didn't know about that! I guess that would work for my use case. Is this specified or accidental behaviour?
Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such.
Doesn't the same apply to Thread.current and thread variables? Nothing stops anyone from setting a mutable hash and changing it across threads.
I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context)
I understand where the sentiment is coming from, as I'm dealing with a similar situation in this PR. My two cents is that ruby needs a #deep_dup (and a #deep_freeze, but that's a different discussion / feature request :) ) as a core feature. FWIW we're doing something of the kind internally, which we may open-source at some point.