Project

General

Profile

Actions

Bug #12689

open

Thread isolation of $~ and $_

Added by headius (Charles Nutter) about 5 years ago. Updated 6 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:76976]
Tags:

Description

We are debating what is correct behavior now, and what should be correct behavior in the future, for the thread-visibility of the special variables %~ and $_

We have several examples from https://github.com/jruby/jruby/issues/3031 that seem to exhibit conflicting behavior...or at least the behavior is unexpected in many cases.

$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
nil

$ ruby23 -e 'def foo; proc { p $~; "foo" =~ /foo/ }; end; p = foo; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
#<MatchData "foo">

$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; def foo(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo(p)'
nil
#<MatchData "foo">

$ ruby23 -e 'class Foo; P = proc { p $~; "foo" =~ /foo/ }; def foo; Thread.new {P.call}.join; Thread.new{P.call}.join; end; end; Foo.new.foo'
nil
#<MatchData "foo">

$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
nil

$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; bar(p); end; def bar(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
#<MatchData "foo">

These cases exhibit some oddities in whether $~ (and presumably $_) are shared across threads.

The immediate thought is that they should be both frame and thread-local...but ko1 points out that such a change would break cases like this:

def foo
  /foo/ =~ 'foo'
  Proc.new{
    p $~
  }
end

Thread.new{
  foo.call
}.join

So there's a clear conflict here. Users sometimes expect the $~ value to be shared across threads (at least for read, as in ko1's example) and sometimes do not want it shared at all (as in the case of https://github.com/jruby/jruby/issues/3031

Now we discuss.


Related issues

Related to Ruby master - Bug #8444: Regexp vars $~ and friends are not thread localOpenko1 (Koichi Sasada)Actions

Updated by headius (Charles Nutter) about 5 years ago

To clarify the one-liners' behavior: when the thread's top-level frame is the same as a proc's frame that it calls, it will see thread-local values. When the proc's frame is not the top-level frame for the thread, the memory location for $~ will be shared across all threads.

Updated by Eregon (Benoit Daloze) about 5 years ago

Maybe $~ is always set in the surrounding method frame, but never in a block frame?
There is still a lot of weird cases to explain though.

Updated by darix (Marcus Rückert) about 5 years ago

I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.

Updated by headius (Charles Nutter) about 5 years ago

Marcus Rückert wrote:

I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.

If you always use the returned MatchData then you can avoid these problems. This only affects consumers of the implicit $~ variable.

Unfortunately, that also includes some core methods that access the $_ variable, so there's possibility of steppping on threading even if you never use the implicit variables in your code.

Updated by darix (Marcus Rückert) about 5 years ago

That's why i would deprecate the $ variables and make people use match objects all the time.

I mean the stdlib even has code that reads

matchdata = $~

That feels just wrong.

Maybe 2.4 could start issue warnings about using $ variables and 3.0 removes them?

Updated by naruse (Yui NARUSE) about 5 years ago

Below example shows 2nd thread overwrites 1st thread's regexp match result.

% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; def foo; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; end; foo;sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foo">]
["foo", #<MatchData "foo">]

This example doesn't happen above phenomena different from above one.

% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foobar">]
["foo", #<MatchData "foo">]

Updated by headius (Charles Nutter) almost 4 years ago

We've had another report in JRuby about this behavior. In this case, two threads doing String#split step on each others backrefs because they share a backref frame: https://github.com/jruby/jruby/issues/4868

This case can't even be avoided. Even if you don't use $~ there are threading issues. These may come into play for MRI in either split or other methods that consume backref and lastline, but they'll certainly be a problem for all parallel-threaded implementations that wish to be compatible.

Updated by Eregon (Benoit Daloze) almost 4 years ago

FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.

Updated by ko1 (Koichi Sasada) almost 4 years ago

Eregon (Benoit Daloze) wrote:

FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.

Each frame has a map (thread -> MachData)?

Updated by Eregon (Benoit Daloze) almost 4 years ago

ko1 (Koichi Sasada) wrote:

Each frame has a map (thread -> MatchData)?

Conceptually yes, but it is allocated lazily and it specializes for being accessed by a single thread.
A Java ThreadLocal is used in the general case when more than one thread stores a MatchData in a frame.
https://github.com/graalvm/truffleruby/blob/vm-enterprise-0.29/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java

Actions #11

Updated by jeremyevans0 (Jeremy Evans) about 2 years ago

  • Related to Bug #8444: Regexp vars $~ and friends are not thread local added

Updated by headius (Charles Nutter) 6 months ago

Waking this up a bit...

The original issue that prompted this bug report has now been FIXED in JRuby 9.2.17.0 by making String#split never read backref from the frame-local storage:

https://github.com/jruby/jruby/pull/6644

Further improvements will come in 9.3 with the following PR, which eliminates ALL core method reads of backref (none of them used its contents anyway, and only read it to reuse it):

https://github.com/jruby/jruby/pull/6647

With these changes, all concurrency issues surrounding $~ within core methods are resolved. Users that opt into using $~ via the variable or methods like last_match will still have to take care that the value is not being updated across threads, but such updates will not interfere with any $~-related methods in JRuby 9.3.

Updated by headius (Charles Nutter) 6 months ago

Also note this experimental PR that eliminates the update of $~ from String#split, since no specs and no tests check that behavior and it seems unexpected and unpredictable (it updates to the last match during the split loop).

https://github.com/jruby/jruby/pull/6646

And a bug I just filed to eliminate backref updating from start_with? which should be a fast boolean check and not create a MatchData or update backref:

https://bugs.ruby-lang.org/issues/17771

Actions

Also available in: Atom PDF