Bug #21507
closedRegexp considers variable repetition quantifiers invalid in lookbehind
Description
This is my first bug subscription, please feel free to tell me if I can do anything better.
Description¶
Attempting to use "variable" repetition quantifiers (?
, +
,*
,{n,}
, ...) inside lookbehind anchors raises a SyntaxError (invalid pattern in look-behind), but it's perfectly viable to do it in lookafter anchors.
Examples of lookafter working:
irb(main):100> "axb".split /(?=x)/
=> ["a", "xb"]
irb(main):101> "axb".split /(?=x?)/
=> ["a", "x", "b"]
irb(main):102> "axb".split /(?=x+)/
=> ["a", "xb"]
irb(main):103> "axb".split /(?=x*)/
=> ["a", "x", "b"]
irb(main):104> "axb".split /(?=x{1,})/
=> ["a", "xb"]
irb(main):105> "axb".split /(?=x{,1})/
=> ["a", "x", "b"]
irb(main):106> "axb".split /(?=x{1,2})/
=> ["a", "xb"]
Examples of lookbehind working only with non-variable metacharacters:
irb(main):107> "axb".split /(?<=x)/
=> ["ax", "b"]
irb(main):108> "axb".split /(?<=x?)/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):108: invalid pattern in look-behind: /(?<=x?)/ (SyntaxError)
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
irb(main):109> "axb".split /(?<=x*)/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):109: invalid pattern in look-behind: /(?<=x*)/ (SyntaxError)
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
irb(main):110> "axb".split /(?<=x{1,})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):110: invalid pattern in look-behind: /(?<=x{1,})/ (SyntaxError)
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
irb(main):111> "axb".split /(?<=x{,1})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):111: invalid pattern in look-behind: /(?<=x{,1})/ (SyntaxError)
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
irb(main):112> "axb".split /(?<=x{1,2})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):112: invalid pattern in look-behind: /(?<=x{1,2})/ (SyntaxError)
from /usr/local/bin/irb:25:in `load'
from /usr/local/bin/irb:25:in `<main>'
irb(main):113> "axb".split /(?<=x{1})/
=> ["ax", "b"]
irb(main):114> "axb".split /(?<=x{1,1})/
=> ["ax", "b"]
Note¶
I have searched on the internet and, to my knowledge, this behavior is not intended. (This documentation page on regular expressions)[https://ruby-doc.org/core-3.1.0/doc/regexp_rdoc.html], for example, does not say anything about limitations specific to lookbehinds.
Updated by mame (Yusuke Endoh) about 12 hours ago
- Status changed from Open to Feedback
This is currently an intended implementation limitation.
This behavior comes from the specifications of Onigmo, which Ruby's regular expression engine is based on. The Onigmo documentation states the following about look-behinds:
(?<=subexp) look-behind
(?<!subexp) negative look-behind
Subexp of look-behind must be fixed-width.
But top-level alternatives can be of various lengths.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
https://github.com/k-takata/Onigmo/blob/1d7ee878b3e4a9e41bf9825c937ae6cf0a9cd68c/doc/RE#L267-L272
I'm hesitant about whether we should add Onigmo's detailed implementation specifics to the Ruby documentation. However, seeing that there's already a precedent for it, I've opened a PR for now.
Updated by mame (Yusuke Endoh) about 12 hours ago
- Status changed from Feedback to Open
Updated by mame (Yusuke Endoh) about 9 hours ago
- Status changed from Open to Closed
Applied in changeset git|b2fdd26417d1539014c7af499ab1f9b398eca4c0.
Lookbehind regexp must be fixed-length
Fixes [Bug #21507]