Project

General

Profile

Actions

Bug #17507

closed

Regexp capture groups ignored sometimes in some multithreaded environments (possible race condition)

Added by byteit101 (Patrick Plenefisch) 10 months ago. Updated 20 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
[ruby-core:101901]

Description

Same behavior from 2.6-3.0
OS: Debian 10

Ruby Script:

TargetStr = "a-x-foo-bar-baz-z-b"

worker = lambda do
    # For more hits, use File.read here instead of TargetStr
    m = TargetStr.match(/foo-([A-Za-z0-9_\.]+)-baz/) # more cases in the []+ means more hits
    raise "Error, #{m.inspect}" if m[1].nil?
    File.exist? "/tmp"
    TargetStr.gsub(/foo-bar-baz/, "foo-abc-baz") # must match the same as the first match
end

def threaded_test(worker)
    6.times.map {Thread.new {10_000.times {worker.call}}}.each(&:join)
end
threaded_test(worker) # must be a function calling a block or proc or lambda. Change any of that and it doesn't hit this

puts "No Error Hits"

Result of script (number of exceptions varies per-run):

#<Thread:0x00005616663dba08 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
    from bug.rb:12:in `block (3 levels) in threaded_test'
    from bug.rb:12:in `times'
    from bug.rb:12:in `block (2 levels) in threaded_test'
#<Thread:0x00005616663db5f8 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
    from bug.rb:12:in `block (3 levels) in threaded_test'
    from bug.rb:12:in `times'
    from bug.rb:12:in `block (2 levels) in threaded_test'
#<Thread:0x00005616663dbb98 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
    from bug.rb:12:in `block (3 levels) in threaded_test'
    from bug.rb:12:in `times'
    from bug.rb:12:in `block (2 levels) in threaded_test'
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
    from bug.rb:12:in `block (3 levels) in threaded_test'
    from bug.rb:12:in `times'
    from bug.rb:12:in `block (2 levels) in threaded_test'

Expected Result:

No Error Hits

Details:

As is, it usually errors (most of the time it fails about 300-500 iterations in)
A success is what is expected, m == #<MatchData "foo-bar-baz" 1:"bar">
An error is when m == #<MatchData "foo-bar-baz"> (no capture saved)

  • If threaded_test is only one thread (1.times) it always works. For more than one thread, it usually errors.
  • If File.exist? is removed, it always works.
  • If the gsub is removed, it always works.
  • If the gsub is changed to not match the match() call, it always works.
  • If the match expression is simpler (/foo-(bar)-baz/, or even /foo-([a-z]+)-baz/), it usually works.
  • If the threaded_test is not a method, but executed inline (ie, comment out the def, end, call, on lines 11 & 13-14) it always works.
  • If worker is a block, or a lambda (as provided), it usually errors.
  • If worker is put inline into the 10_000.times call, it always works.
  • If TargetStr matches on index 0, it always works.

Related issues

Related to Ruby master - Bug #17349: Rake での並行実行における正規表現マッチングの異常OpenActions
Actions #1

Updated by byteit101 (Patrick Plenefisch) 10 months ago

  • Description updated (diff)
Actions #2

Updated by jeremyevans0 (Jeremy Evans) 9 months ago

  • Related to Bug #17349: Rake での並行実行における正規表現マッチングの異常 added

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

This issue is due to a race condition because the MatchData object is passed indirectly through the backref. I've submitted a pull request to fix it: https://github.com/ruby/ruby/pull/4734

I think it's likely that similar Regexp methods have a similar issue, though I haven't done any testing in that area.

Actions #4

Updated by jeremyevans (Jeremy Evans) 27 days ago

  • Status changed from Open to Closed

Applied in changeset git|abc0304cb28cb9dcc3476993bc487884c139fd11.


Avoid race condition in Regexp#match

In certain conditions, Regexp#match could return a MatchData with
missing captures. This seems to require at the least, multiple
threads calling a method that calls the same block/proc/lambda
which calls Regexp#match.

The race condition happens because the MatchData is passed from
indirectly via the backref, and other threads can modify the
backref.

Fix the issue by:

  1. Not reusing the existing MatchData from the backref, and always allocating a new MatchData.
  2. Passing the MatchData directly to the caller using a VALUE*, instead of indirectly through the backref.

It's likely that variants of this issue exist for other Regexp
methods. Anywhere that MatchData is passed implicitly through
the backref is probably vulnerable to this issue.

Fixes [Bug #17507]

Actions #5

Updated by nagachika (Tomoyuki Chikanaga) 26 days ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) 20 days ago

  • Backport changed from 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: REQUIRED to 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: DONE

ruby_3_0 cfad0583eb18bb4505f28ee5cfab703a6b9987be merged revision(s) abc0304cb28cb9dcc3476993bc487884c139fd11.

Actions

Also available in: Atom PDF