Project

General

Profile

Actions

Feature #22119

open

Thread: inherit storage on child threads

Feature #22119: Thread: inherit storage on child threads

Added by chucke (Tiago Cardoso) 3 days ago. Updated 2 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125782]

Description

Motivation

At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the Thread#thread_variable_get/set family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like:

def new_thread do
  logging_context = Logging.current_context.deep_dup
  database_context = DB.current_context.deep_dup
  # etc, etc
  Thread.new do
    Logging.set_context(logging_context)
    DB.set_context/database_context)
    yield
  end
end

alongside a plethora of checks and rubocops for forbid direct usage of Thread.new. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of Thread.new without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context.

In order to fix that, we've resorted to monkeypatching Thread.new (and Fiber.new), which feels necessary for this use case, but not right. This seems like a feature missing from ruby.

Research

ruby does have prior art for "inheritable" storage: Fiber#storage; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless Fiber.new(storage: nil)). This is exactly what's missing here.

Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's ThreadLocalVar, which seems to be inspired in Java's own InheritableThreadLocal.

Proposal

ruby should have an OTTB solution for inheritable thread storage. Some options to consider:

1: :storage kwarg

An option would be to align with Fiber.new and support a storage kwarg:

Thread.current.thread_variable_set(:foo, "bar")
Thread.new(storage: nil) do # this has to be default
  Thread.current.thread_variable_get(:foo) #=> nil
end

Thread.new(storage: Thread.current.storage) do # would be a new method
  Thread.current.thread_variable_get(:foo) #=> nil
end

as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. RUBY_THREAD_STORAGE=inherit or something of the kind (which ruby historically hasn't supported).

2. Thread::Variable

A new primitive, Thread::Variable, could be integrated, that would have the same semantics as concurrent-ruby's ThreadLocalVar:

VAL = Thread::Variable.new(2)

Thread.new do
  puts VAL.value #=> 2
  VAL.value = 3
  puts VAL.value #=> 3
end.join

puts VAL.value #=> 2

I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness.

Updated by nobu (Nobuyoshi Nakada) 2 days ago Actions #1 [ruby-core:125784]

chucke (Tiago Cardoso) wrote:

1: :storage kwarg

An option would be to align with Fiber.new and support a storage kwarg:

as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. RUBY_THREAD_STORAGE=inherit or something of the kind (which ruby historically hasn't supported).

I don't think this feature itself bad, but I'm afraid that changing the behavior by an environment variable could be too intrusive and might break other libraries.

chucke (Tiago Cardoso) wrote:

2. Thread::Variable

A new primitive, Thread::Variable, could be integrated, that would have the same semantics as concurrent-ruby's ThreadLocalVar:

I seem to recall there being a similar proposal before, but I don't remember exactly.

Updated by nobu (Nobuyoshi Nakada) 2 days ago 1Actions #2 [ruby-core:125785]

Fiber-local storages are inherited across Threads.
Wouldn't this be sufficient for your use case?

Fiber[:a] = 10
Thread.new { Fiber[:a] }.value #=> 10

Updated by jhawthorn (John Hawthorn) 2 days ago 1Actions #3 [ruby-core:125786]

It was also initially a surprise to me that Fiber storage is inherited across threads as well.

In my experience, inheritable Fiber storage has mostly been a mistake and cause of bugs, I don't think we should spread that to threads.

New threads/fibers spawned implicitly inherit Fiber storage in the state it was in at that time. If these Threads/Fibers are long lived or spawn additional Threads/Fibers, they will retain whatever values were set in that storage at the time for their lifetime (which for ex. a thread pool is the life of the process)

Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such. Anything mutable in Fiber storage should require a Mutex guard, and whatever else one would otherwise have used for mutable, shared, global state.

I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context) instead of pushing this into the language (which would also be problematic, but at least contained and not incorrect by default).

Updated by chucke (Tiago Cardoso) 2 days ago Actions #4 [ruby-core:125787]

Fiber-local storages are inherited across Threads.

Wow, that was really unexpected, I didn't know about that! I guess that would work for my use case. Is this specified or accidental behaviour?

Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such.

Doesn't the same apply to Thread.current and thread variables? Nothing stops anyone from setting a mutable hash and changing it across threads.

I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context)

I understand where the sentiment is coming from, as I'm dealing with a similar situation in this PR. My two cents is that ruby needs a #deep_dup (and a #deep_freeze, but that's a different discussion / feature request :) ) as a core feature. FWIW we're doing something of the kind internally, which we may open-source at some point.

Actions

Also available in: PDF Atom