Project

General

Profile

Feature #13893

Add Fiber#[] and Fiber#[]= and restore Thread#[] and Thread#[]= to their original behavior

Added by cremes (Chuck Remes) 10 months ago. Updated 7 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:82762]

Description

Ruby 3 API cleanup suggestion.

The Thread and Fiber classes have a very odd API for setting/getting thread local and fiber local variables. With Ruby 3 coming soon, this is a perfect opportunity to make this API more coherent and return to the Principal of Least Surprise. The concept of Fibers and Threads should be completely separated and we should no longer assume that a Fiber is attached to any particular Thread.

I suggest this:

class Fiber
  # Gets a fiber-local variable.
  def [](index)
    ...
  end

  # Sets a fiber-local variable.
  def []=(index, value)
    ...
  end

  # Returns true if the given +key+ exists as a fiber-local variable.
  def key?(key)
    ...
  end

  # Returns an array of fiber-local variable names as symbols.
  def keys
    ...
  end
end

class Thread
  # Gets a thread-local variable.
  def [](index)
    ...
  end

  # Sets a thread-local variable.
  def []=(index, value)
    ...
  end

  # Returns true if the given +key+ exists as a thread-local variable.
  def key?(key)
    ...
  end

  # Returns an array of thread-local variable names as symbols.
  def keys
    ...
  end
end

Also, remove Thread#thread_variable?, Thread#thread_variable_get, Thread#variable_set, and Thread#thread_variables since that behavior is already covered by Thread#key?, Thread#keys, Thread#[], and Thread#[]=. The APIs for both Thread and Fiber are more coherent and less surprising with these changes.

History

#1 [ruby-core:82764] Updated by Eregon (Benoit Daloze) 10 months ago

I agree this would be much nicer and consistent.

I can't evaluate the impact on compatibility, but it's likely quite big.

#2 [ruby-core:82766] Updated by Eregon (Benoit Daloze) 10 months ago

Another idea which is related to #13245 would be to have these as class methods on Fiber and Thread (Thread[:foo]; Thread[:foo] = :bar; Thread.keys; Thread.key? :foo),
which would limit only accessing variables of the current fiber/thread.
It would allow to keep old methods for compatibility and maybe progressively deprecate them.

#3 [ruby-core:82789] Updated by shevegen (Robert A. Heiler) 9 months ago

I have nothing against the suggestion itself. However, there is no general, universal "Principal of Least Surprise".

To the topic, Threads and Fibers - I have used Threads sometimes but rarely. I haven't yet used Fibers. Without knowing any particular use case, I have a hard time trying to want to use something new.

For me, well aside from the suggestion here, I find this all way too complicated. Threads were actually quite simple... no idea how Fibers fit into anything there... or Mutex. I hope someone will simplify this all, no matter how. :)

As for ruby 3.0, I do not think it is "coming soon". Ruby 2.5 will come this year, I am not sure that ruby 3.0 will already be the very next ... matz said in one presentation that there is "tons of work" still about to happen for ruby 3, so I don't think that is all going to happen over night (if all the mentioned features that matz spoke about will go into Ruby 3.x, and I guess the core team wants the first ruby 3 release to be stable rather than unstable, which also will take a bit).

#4 [ruby-core:82794] Updated by cremes (Chuck Remes) 9 months ago

Ideally some of these changes could be rolled out in 2.x versions with deprecation notices for those methods whose behavior will change or those methods will be eliminated. For the 3.0 release, the deprecated methods should be removed entirely. Yes, these are breaking changes. Ruby3 is a wonderful opportunity to make the core APIs more consistent. If we miss this chance, it will be years (a decade?) before we get another chance to fix these inconsistencies.

#5 [ruby-core:82995] Updated by shyouhei (Shyouhei Urabe) 9 months ago

This sounds in fact cleaner. However I'm afraid it is too difficult to design a transition path. Fiber#[] could be okay but changing Thread#[] seems almost impossible... Can we warn users about such change in some way?

#6 [ruby-core:84011] Updated by cremes (Chuck Remes) 7 months ago

I have an alternative suggestion. Since shyouhei (Shyouhei Urabe) says that my original suggestion is too difficult a transition path, here is an alternative.

      # Convenience class to be used by Thread and Fiber. Those classes
      # have an odd way of dealing with thread-local and fiber-local variables.
      # This is a storage mechanism to replace the standard mechanism.
      #
      class Local
        extend Forwardable

        def_delegators :@storage, :[], :[]=, :key?, :keys

        def initialize
          @storage = {}
        end
      end

      # Done as a module so we can easily add it to a running Thread or Fiber via #extend.
      # e.g. add this to root Thread via Thread.main.extend(LocalMixin)
      #
      module LocalMixin
        def local
          @local ||= Local.new
        end
      end

This adds Thread#local and Fiber#local instance methods. We can then access them very easily:

Thread.current.local[:key] = 'thread local value'
Fiber.current.local[:key] = 'fiber local value'

This has several advantages:

  • Cleans up the namespace by explicitly putting all "local" variables under the #local instance method accessor
  • Does NOT break any existing code; the current thread and fiber local methods will continue to work
  • We could potentially modify the Local class to use the Thread#thread_variable_set/get methods and the Thread#[]/[]= methods internally. This way users could access their local storage via either mechanism during the transition.

I can create a pull request with specs if anyone is interested in sponsoring this change.

#7 [ruby-core:84012] Updated by Eregon (Benoit Daloze) 7 months ago

cremes (Chuck Remes) wrote:

I have an alternative suggestion. Since shyouhei (Shyouhei Urabe) says that my original suggestion is too difficult a transition path, here is an alternative.
...
This adds Thread#local and Fiber#local instance methods. We can then access them very easily:

Thread.current.local[:key] = 'thread local value'
Fiber.current.local[:key] = 'fiber local value'

This looks good to me.

I think we can make it a bit nicer with:

Thread.local[:key] = 'thread local value'
Fiber.local[:key] = 'fiber local value'

It's a bit shorter and nicer to read for me.

But also with that API it discourages to get/set the thread/fiber-locals of another thread/fiber,
which is against the purpose of thread/fiber-local variables.
That API does not prevent it though, since one could save the value of Thread.local and use it in another thread
(we could enforce it with exceptions or have Thread/Fiber.local return an object always resolving the current Thread).
Sorry for bringing #13245 again but I think it is a very important concern for a new API: communicate the right usage.

One detail, where would we define this Local class?
::Local sounds too risky. Thread::Local or Fiber::Local sounds misleading.
Or maybe Thread.local and Fiber.local should just return a Hash instance for the current Thread/Fiber?

#8 [ruby-core:84036] Updated by cremes (Chuck Remes) 7 months ago

One detail, where would we define this Local class?
::Local sounds too risky. Thread::Local or Fiber::Local sounds misleading.

The Local class is meant to be private. It's behavior should be the same/similar for both Fiber and Thread. Putting it under the Thread namespace makes sense to me since Fibers are also subordinate to Threads (i.e. they are locked to their creating thread, can't migrate between threads, cannot exist independent of a Thread, etc.).

Syntax:

  Fiber.local[:key] = 'fiber local value'
  Thread.local[:key] = 'thread local value'
vs
  Fiber.current.local[:key] = 'fiber local value'
  Thread.current.local[:key] = 'thread local value'

I understand why you prefer the first syntax since it is shorter. However, I think it obscures the ownership of the thread-local or fiber-local data when making it available via a class method. This local storage does NOT belong to the class; it belongs to an instance of the class. I think in this situation that requiring a few extra keystrokes is necessary to highlight the correct ownership of the data by making it required to access the data via an instance of Thread or Fiber.

Regarding discouraging modification of locals from a different thread/fiber, the Local class should be modified to capture the Thread.current and Fiber.current at the time of creation. This can be checked when accessing the local storage and throw an Exception when a different thread or fiber tries to modify local storage that it does not own. Good catch!

Here's some updated code:

  class Local
    def initialize
      @storage = {}
      @thread_creator = Thread.current
      @fiber_creator = Fiber.current
    end

    def [](key)
      ownership_check
      @storage[key]
    end

    def []=(key, value)
      ownership_check
      @storage[key] = value
    end

    def key?(key)
      ownership_check
      @storage.key?(key)
    end

    def keys
      ownership_check
      @storage.keys
    end

    private

    def ownership_check
      return if @thread_creator == Thread.current && @fiber_creator == Fiber.current
      raise ThreadError, "Access to local storage disallowed from non-originating thread or fiber!"
    end
  end

Also available in: Atom PDF