Project

General

Profile

Actions

Bug #17341

closed

Unsound quantifier reduction with nested quantifiers

Added by jirkamarsik (Jirka Marsik) about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
[ruby-core:101030]

Description

The rules for reducing nested quantifiers can produce quantifiers with semantics which differ from the original quantifiers. This can then lead to the regular expressions matching different strings.

irb(main):001:0> /(?:a+?)*/.match('aa')
(irb):1: warning: nested repeat operator '+?' and '*' was replaced with '+? and ?' in regular expression: /(?:a+?)*/
=> #<MatchData "a">
irb(main):002:0> /(a+?)*/.match('aa')
=> #<MatchData "aa" 1:"a">

In the above, we can see that by inserting a capture group between the two quantifiers, we prevent quantifier reduction from occurring and we get a regexp that matches the whole input. If we let quantifier reduction happen, we get a resulting regexp that only matches the first character. I think quantifier reduction should not change the behavior of a regexp, as it is just an optimization.

I found the quantifier reduction rules in ReduceTypeTable in regparse.c. I haven't checked them all but the ones that replace two quantifiers by two other quantifiers caught my eye.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0