Project

General

Profile

Bug #13892

Matching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails in all versions >= 1.9

Added by jhriggs (Jim Riggs) 10 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
1.8.7, 1.9.3p551, 2.0.0p648, 2.1.9p490, 2.2.7p470, 2.3.4p301, 2.4.1p111
[ruby-core:82760]

Description

This is a very specific regex failure that occurs when the final character of the string is matched by the end of a pattern that terminates with .*\b. For example:

"abc" =~ /c.*\b/
"abc" =~ /abc.*\b/
"abc" =~ /\b.*abc.*\b/

In Ruby 1.8.7 and every other language I have tested (perl, pcre, javascript, python, go, ...) this matches. Starting in 1.9, it appears that the greedy .* causes the \b to fail, though it should match. This only occurs when the pattern matches at the end of the string being matched. Based on my non-exhaustive testing, this only occurs with .*\b; other patterns like.?\b and specific characters such as d*\b work as expected:

ruby1.8 : /c.*\b/       : true
ruby1.8 : /abc.*\b/     : true
ruby1.8 : /\b.*abc.*\b/ : true
ruby1.8 : /c.?\b/       : true
ruby1.8 : /abc.?\b/     : true
ruby1.8 : /\b.?abc.?\b/ : true
ruby1.8 : /d*\b/        : true
ruby1.8 : /abcd*\b/     : true
ruby1.8 : /\b.*abcd*\b/ : true

ruby1.9 : /c.*\b/       : false
ruby1.9 : /abc.*\b/     : false
ruby1.9 : /\b.*abc.*\b/ : false
ruby1.9 : /c.?\b/       : true
ruby1.9 : /abc.?\b/     : true
ruby1.9 : /\b.?abc.?\b/ : true
ruby1.9 : /d*\b/        : true
ruby1.9 : /abcd*\b/     : true
ruby1.9 : /\b.*abcd*\b/ : true

ruby2.0 : /c.*\b/       : false
ruby2.0 : /abc.*\b/     : false
ruby2.0 : /\b.*abc.*\b/ : false
ruby2.0 : /c.?\b/       : true
ruby2.0 : /abc.?\b/     : true
ruby2.0 : /\b.?abc.?\b/ : true
ruby2.0 : /d*\b/        : true
ruby2.0 : /abcd*\b/     : true
ruby2.0 : /\b.*abcd*\b/ : true

ruby2.1 : /c.*\b/       : false
ruby2.1 : /abc.*\b/     : false
ruby2.1 : /\b.*abc.*\b/ : false
ruby2.1 : /c.?\b/       : true
ruby2.1 : /abc.?\b/     : true
ruby2.1 : /\b.?abc.?\b/ : true
ruby2.1 : /d*\b/        : true
ruby2.1 : /abcd*\b/     : true
ruby2.1 : /\b.*abcd*\b/ : true

ruby2.2 : /c.*\b/       : false
ruby2.2 : /abc.*\b/     : false
ruby2.2 : /\b.*abc.*\b/ : false
ruby2.2 : /c.?\b/       : true
ruby2.2 : /abc.?\b/     : true
ruby2.2 : /\b.?abc.?\b/ : true
ruby2.2 : /d*\b/        : true
ruby2.2 : /abcd*\b/     : true
ruby2.2 : /\b.*abcd*\b/ : true

ruby2.3 : /c.*\b/       : false
ruby2.3 : /abc.*\b/     : false
ruby2.3 : /\b.*abc.*\b/ : false
ruby2.3 : /c.?\b/       : true
ruby2.3 : /abc.?\b/     : true
ruby2.3 : /\b.?abc.?\b/ : true
ruby2.3 : /d*\b/        : true
ruby2.3 : /abcd*\b/     : true
ruby2.3 : /\b.*abcd*\b/ : true

ruby2.4 : /c.*\b/       : false
ruby2.4 : /abc.*\b/     : false
ruby2.4 : /\b.*abc.*\b/ : false
ruby2.4 : /c.?\b/       : true
ruby2.4 : /abc.?\b/     : true
ruby2.4 : /\b.?abc.?\b/ : true
ruby2.4 : /d*\b/        : true
ruby2.4 : /abcd*\b/     : true
ruby2.4 : /\b.*abcd*\b/ : true

See also:

Associated revisions

Revision 31796f17
Added by naruse (Yui NARUSE) 7 months ago

Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.

[Bug #13892]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60966 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 60966
Added by naruse (Yui NARUSE) 7 months ago

Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.

[Bug #13892]

Revision 60966
Added by naruse (Yui NARUSE) 7 months ago

Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.

[Bug #13892]

History

#1 [ruby-core:82806] Updated by naruse (Yui NARUSE) 9 months ago

As far as I understand, this is intentional behavior of Oniguruma (Onigmo), which Ruby uses.

How do you think, k-takata?

#2 [ruby-core:82812] Updated by jhriggs (Jim Riggs) 9 months ago

naruse (Yui NARUSE) -

You might be right. I (shamefully) did not investigate what regex library Ruby is using under the hood. Sorry.

I will do some testing with the upstream code to see if the problem lies there or in Ruby. If there, I will create a bug upstream.

#3 [ruby-core:82813] Updated by jhriggs (Jim Riggs) 9 months ago

naruse (Yui NARUSE), k-takata -

Testing with Onigmo, Oniguruma, and php's mb_ereg() this case does indeed fail to match, though based on my understanding of this particular pattern (and all of the other regex implementations cited), it should match.

I opened the following upstream issues. I'm not sure if you want to close this or leave it open until fixed upstream.

#4 [ruby-core:83728] Updated by hsbt (Hiroshi SHIBATA) 8 months ago

  • Assignee set to naruse (Yui NARUSE)
  • Status changed from Open to Assigned

This issue was fixed at Onigumo-6.1.4. We should merge it from upstream.

#5 Updated by naruse (Yui NARUSE) 7 months ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r60966.


Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.

[Bug #13892]

Also available in: Atom PDF