Bug #17507
closedRegexp capture groups ignored sometimes in some multithreaded environments (possible race condition)
Description
Same behavior from 2.6-3.0
OS: Debian 10
Ruby Script:
TargetStr = "a-x-foo-bar-baz-z-b"
worker = lambda do
# For more hits, use File.read here instead of TargetStr
m = TargetStr.match(/foo-([A-Za-z0-9_\.]+)-baz/) # more cases in the []+ means more hits
raise "Error, #{m.inspect}" if m[1].nil?
File.exist? "/tmp"
TargetStr.gsub(/foo-bar-baz/, "foo-abc-baz") # must match the same as the first match
end
def threaded_test(worker)
6.times.map {Thread.new {10_000.times {worker.call}}}.each(&:join)
end
threaded_test(worker) # must be a function calling a block or proc or lambda. Change any of that and it doesn't hit this
puts "No Error Hits"
Result of script (number of exceptions varies per-run):
#<Thread:0x00005616663dba08 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
from bug.rb:12:in `block (3 levels) in threaded_test'
from bug.rb:12:in `times'
from bug.rb:12:in `block (2 levels) in threaded_test'
#<Thread:0x00005616663db5f8 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
from bug.rb:12:in `block (3 levels) in threaded_test'
from bug.rb:12:in `times'
from bug.rb:12:in `block (2 levels) in threaded_test'
#<Thread:0x00005616663dbb98 bug.rb:12 run> terminated with exception (report_on_exception is true):
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
from bug.rb:12:in `block (3 levels) in threaded_test'
from bug.rb:12:in `times'
from bug.rb:12:in `block (2 levels) in threaded_test'
bug.rb:6:in `block in <main>': Error, #<MatchData "foo-bar-baz"> (RuntimeError)
from bug.rb:12:in `block (3 levels) in threaded_test'
from bug.rb:12:in `times'
from bug.rb:12:in `block (2 levels) in threaded_test'
Expected Result:
No Error Hits
Details:
As is, it usually errors (most of the time it fails about 300-500 iterations in)
A success is what is expected, m == #<MatchData "foo-bar-baz" 1:"bar">
An error is when m == #<MatchData "foo-bar-baz"> (no capture saved)
- If threaded_test is only one thread (1.times) it always works. For more than one thread, it usually errors.
- If File.exist? is removed, it always works.
- If the gsub is removed, it always works.
- If the gsub is changed to not match the match() call, it always works.
- If the match expression is simpler (/foo-(bar)-baz/, or even /foo-([a-z]+)-baz/), it usually works.
- If the threaded_test is not a method, but executed inline (ie, comment out the def, end, call, on lines 11 & 13-14) it always works.
- If worker is a block, or a lambda (as provided), it usually errors.
- If worker is put inline into the 10_000.times call, it always works.
- If TargetStr matches on index 0, it always works.
Updated by byteit101 (Patrick Plenefisch) almost 4 years ago
- Description updated (diff)
Updated by jeremyevans0 (Jeremy Evans) almost 4 years ago
- Related to Bug #17349: Rake での並行実行における正規表現マッチングの異常 added
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
This issue is due to a race condition because the MatchData object is passed indirectly through the backref. I've submitted a pull request to fix it: https://github.com/ruby/ruby/pull/4734
I think it's likely that similar Regexp methods have a similar issue, though I haven't done any testing in that area.
Updated by jeremyevans (Jeremy Evans) about 3 years ago
- Status changed from Open to Closed
Applied in changeset git|abc0304cb28cb9dcc3476993bc487884c139fd11.
Avoid race condition in Regexp#match
In certain conditions, Regexp#match could return a MatchData with
missing captures. This seems to require at the least, multiple
threads calling a method that calls the same block/proc/lambda
which calls Regexp#match.
The race condition happens because the MatchData is passed from
indirectly via the backref, and other threads can modify the
backref.
Fix the issue by:
- Not reusing the existing MatchData from the backref, and always
allocating a new MatchData. - Passing the MatchData directly to the caller using a VALUE*,
instead of indirectly through the backref.
It's likely that variants of this issue exist for other Regexp
methods. Anywhere that MatchData is passed implicitly through
the backref is probably vulnerable to this issue.
Fixes [Bug #17507]
Updated by nagachika (Tomoyuki Chikanaga) about 3 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: REQUIRED
Updated by nagachika (Tomoyuki Chikanaga) about 3 years ago
- Backport changed from 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: REQUIRED to 2.6: UNKNOWN, 2.7: REQUIRED, 3.0: DONE
ruby_3_0 cfad0583eb18bb4505f28ee5cfab703a6b9987be merged revision(s) abc0304cb28cb9dcc3476993bc487884c139fd11.
Updated by mame (Yusuke Endoh) 5 months ago
- Related to Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3 added