Project

General

Profile

Actions

Bug #12452

closed

Regexp alternation does not backtrack to check the other alternatives if a match is found on the first one

Added by hukasu (Lucas Farias) almost 8 years ago. Updated almost 8 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:75825]

Description

Hi, there is a problem with Regexps containing alternation where it returns the first matched alternative even if there is another alternative that would've resulted on a longer match, it's probably caused by not backtracking and checking the remaining alternatives.

If you have /a|ab/.match("ab") it returns just "a". Many discussions that I found on the internet suggested to whoever was making the complaint was just change the order of the alternatives so that it tests first the longer alternative, but for regular expressions like /(abc|ab)(de|cdef)/.match("abcdef") the alternative of the first alternation that would result on a longer match is the shorter one. And these examples don't even have repetition.

irb(main):001:0> /a|ab/.match("ab")
=> #<MatchData "a">
irb(main):002:0> /(abc|ab)(de|cedf)/.match("abcdef")
=> #<MatchData "abcde" 1:"abc" 2:"de">


Related issues 1 (0 open1 closed)

Has duplicate Ruby master - Bug #12584: Regexp using repetition with alternation doesn't match greedilyRejectedActions

Updated by hukasu (Lucas Farias) almost 8 years ago

  • ruby -v changed from 2.0.0 to 2.2.4

Didn't know that support for 2.0.0 was dropped, updated to 2.2.4, bug still present

Updated by shyouhei (Shyouhei Urabe) almost 8 years ago

  • Status changed from Open to Rejected

This is how a regexp works. AFAIK perl, php, python, nodejs all behave the same way as ruby do. I'm afraid changing here brings more harm than good.

 % perl -e "'ab' =~ /a|ab/; warn $&"
a at -e line 1.
% php -a
Interactive shell

php > preg_match("/a|ab/", "ab", $m); echo $m[0];
a
php >
% python
Python 2.7.11 (default, Jan 22 2016, 08:29:18)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m = re.search("(a|ab)", "ab")
>>> m.group(0)
'a'
>>>
% node --interactive
> "ab".match(/a|ab/);
[ 'a', index: 0, input: 'ab' ]
>
Actions #3

Updated by shyouhei (Shyouhei Urabe) almost 8 years ago

  • Has duplicate Bug #12584: Regexp using repetition with alternation doesn't match greedily added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0