Bug #12452
closedRegexp alternation does not backtrack to check the other alternatives if a match is found on the first one
Description
Hi, there is a problem with Regexps containing alternation where it returns the first matched alternative even if there is another alternative that would've resulted on a longer match, it's probably caused by not backtracking and checking the remaining alternatives.
If you have /a|ab/.match("ab") it returns just "a". Many discussions that I found on the internet suggested to whoever was making the complaint was just change the order of the alternatives so that it tests first the longer alternative, but for regular expressions like /(abc|ab)(de|cdef)/.match("abcdef") the alternative of the first alternation that would result on a longer match is the shorter one. And these examples don't even have repetition.
irb(main):001:0> /a|ab/.match("ab")
=> #<MatchData "a">
irb(main):002:0> /(abc|ab)(de|cedf)/.match("abcdef")
=> #<MatchData "abcde" 1:"abc" 2:"de">
Updated by hukasu (Lucas Farias) over 8 years ago
- ruby -v changed from 2.0.0 to 2.2.4
Didn't know that support for 2.0.0 was dropped, updated to 2.2.4, bug still present
Updated by shyouhei (Shyouhei Urabe) over 8 years ago
- Status changed from Open to Rejected
This is how a regexp works. AFAIK perl, php, python, nodejs all behave the same way as ruby do. I'm afraid changing here brings more harm than good.
% perl -e "'ab' =~ /a|ab/; warn $&"
a at -e line 1.
% php -a
Interactive shell
php > preg_match("/a|ab/", "ab", $m); echo $m[0];
a
php >
% python
Python 2.7.11 (default, Jan 22 2016, 08:29:18)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m = re.search("(a|ab)", "ab")
>>> m.group(0)
'a'
>>>
% node --interactive
> "ab".match(/a|ab/);
[ 'a', index: 0, input: 'ab' ]
>
Updated by shyouhei (Shyouhei Urabe) over 8 years ago
- Has duplicate Bug #12584: Regexp using repetition with alternation doesn't match greedily added