Bug #13892
closedMatching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails in all versions >= 1.9
Description
This is a very specific regex failure that occurs when the final character of the string is matched by the end of a pattern that terminates with .*\b
. For example:
"abc" =~ /c.*\b/
"abc" =~ /abc.*\b/
"abc" =~ /\b.*abc.*\b/
In Ruby 1.8.7 and every other language I have tested (perl, pcre, javascript, python, go, ...) this matches. Starting in 1.9, it appears that the greedy .*
causes the \b
to fail, though it should match. This only occurs when the pattern matches at the end of the string being matched. Based on my non-exhaustive testing, this only occurs with .*\b
; other patterns like.?\b
and specific characters such as d*\b
work as expected:
ruby1.8 : /c.*\b/ : true
ruby1.8 : /abc.*\b/ : true
ruby1.8 : /\b.*abc.*\b/ : true
ruby1.8 : /c.?\b/ : true
ruby1.8 : /abc.?\b/ : true
ruby1.8 : /\b.?abc.?\b/ : true
ruby1.8 : /d*\b/ : true
ruby1.8 : /abcd*\b/ : true
ruby1.8 : /\b.*abcd*\b/ : true
ruby1.9 : /c.*\b/ : false
ruby1.9 : /abc.*\b/ : false
ruby1.9 : /\b.*abc.*\b/ : false
ruby1.9 : /c.?\b/ : true
ruby1.9 : /abc.?\b/ : true
ruby1.9 : /\b.?abc.?\b/ : true
ruby1.9 : /d*\b/ : true
ruby1.9 : /abcd*\b/ : true
ruby1.9 : /\b.*abcd*\b/ : true
ruby2.0 : /c.*\b/ : false
ruby2.0 : /abc.*\b/ : false
ruby2.0 : /\b.*abc.*\b/ : false
ruby2.0 : /c.?\b/ : true
ruby2.0 : /abc.?\b/ : true
ruby2.0 : /\b.?abc.?\b/ : true
ruby2.0 : /d*\b/ : true
ruby2.0 : /abcd*\b/ : true
ruby2.0 : /\b.*abcd*\b/ : true
ruby2.1 : /c.*\b/ : false
ruby2.1 : /abc.*\b/ : false
ruby2.1 : /\b.*abc.*\b/ : false
ruby2.1 : /c.?\b/ : true
ruby2.1 : /abc.?\b/ : true
ruby2.1 : /\b.?abc.?\b/ : true
ruby2.1 : /d*\b/ : true
ruby2.1 : /abcd*\b/ : true
ruby2.1 : /\b.*abcd*\b/ : true
ruby2.2 : /c.*\b/ : false
ruby2.2 : /abc.*\b/ : false
ruby2.2 : /\b.*abc.*\b/ : false
ruby2.2 : /c.?\b/ : true
ruby2.2 : /abc.?\b/ : true
ruby2.2 : /\b.?abc.?\b/ : true
ruby2.2 : /d*\b/ : true
ruby2.2 : /abcd*\b/ : true
ruby2.2 : /\b.*abcd*\b/ : true
ruby2.3 : /c.*\b/ : false
ruby2.3 : /abc.*\b/ : false
ruby2.3 : /\b.*abc.*\b/ : false
ruby2.3 : /c.?\b/ : true
ruby2.3 : /abc.?\b/ : true
ruby2.3 : /\b.?abc.?\b/ : true
ruby2.3 : /d*\b/ : true
ruby2.3 : /abcd*\b/ : true
ruby2.3 : /\b.*abcd*\b/ : true
ruby2.4 : /c.*\b/ : false
ruby2.4 : /abc.*\b/ : false
ruby2.4 : /\b.*abc.*\b/ : false
ruby2.4 : /c.?\b/ : true
ruby2.4 : /abc.?\b/ : true
ruby2.4 : /\b.?abc.?\b/ : true
ruby2.4 : /d*\b/ : true
ruby2.4 : /abcd*\b/ : true
ruby2.4 : /\b.*abcd*\b/ : true
See also:
- https://regex101.com/r/JBzSic/2 (PHP/PCRE, Javascript, Python, Go)
- http://fiddle.re/gkm4ad (Go, Java, Javascript, .Net, Perl, PHP, Python, XRegExp)
- http://java-regex-tester.appspot.com/regex/04925044-ca95-46c6-bec5-329057c04ab2 (Java)
Updated by naruse (Yui NARUSE) over 7 years ago
As far as I understand, this is intentional behavior of Oniguruma (Onigmo), which Ruby uses.
How do you think, k-takata?
Updated by jhriggs (Jim Riggs) over 7 years ago
You might be right. I (shamefully) did not investigate what regex library Ruby is using under the hood. Sorry.
I will do some testing with the upstream code to see if the problem lies there or in Ruby. If there, I will create a bug upstream.
Updated by jhriggs (Jim Riggs) over 7 years ago
@naruse (Yui NARUSE), k-takata -
Testing with Onigmo, Oniguruma, and php's mb_ereg() this case does indeed fail to match, though based on my understanding of this particular pattern (and all of the other regex implementations cited), it should match.
I opened the following upstream issues. I'm not sure if you want to close this or leave it open until fixed upstream.
Updated by hsbt (Hiroshi SHIBATA) over 7 years ago
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
This issue was fixed at Onigumo-6.1.4. We should merge it from upstream.
Updated by naruse (Yui NARUSE) over 7 years ago
- Status changed from Assigned to Closed
Applied in changeset trunk|r60966.
Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.
[Bug #13892]