Project

General

Profile

Actions

Bug #21507

closed

Regexp considers variable repetition quantifiers invalid in lookbehind

Added by tiago-macedo (Tiago Macedo) about 12 hours ago. Updated about 9 hours ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
[ruby-core:122722]

Description

This is my first bug subscription, please feel free to tell me if I can do anything better.

Description

Attempting to use "variable" repetition quantifiers (?, +,*,{n,}, ...) inside lookbehind anchors raises a SyntaxError (invalid pattern in look-behind), but it's perfectly viable to do it in lookafter anchors.

Examples of lookafter working:

irb(main):100> "axb".split /(?=x)/
=> ["a", "xb"]
irb(main):101> "axb".split /(?=x?)/
=> ["a", "x", "b"]
irb(main):102> "axb".split /(?=x+)/
=> ["a", "xb"]
irb(main):103> "axb".split /(?=x*)/
=> ["a", "x", "b"]
irb(main):104> "axb".split /(?=x{1,})/
=> ["a", "xb"]
irb(main):105> "axb".split /(?=x{,1})/
=> ["a", "x", "b"]
irb(main):106> "axb".split /(?=x{1,2})/
=> ["a", "xb"]

Examples of lookbehind working only with non-variable metacharacters:

irb(main):107> "axb".split /(?<=x)/
=> ["ax", "b"]
irb(main):108> "axb".split /(?<=x?)/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):108: invalid pattern in look-behind: /(?<=x?)/ (SyntaxError)
	from /usr/local/bin/irb:25:in `load'
	from /usr/local/bin/irb:25:in `<main>'
irb(main):109> "axb".split /(?<=x*)/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):109: invalid pattern in look-behind: /(?<=x*)/ (SyntaxError)
	from /usr/local/bin/irb:25:in `load'
	from /usr/local/bin/irb:25:in `<main>'
irb(main):110> "axb".split /(?<=x{1,})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):110: invalid pattern in look-behind: /(?<=x{1,})/ (SyntaxError)
	from /usr/local/bin/irb:25:in `load'
	from /usr/local/bin/irb:25:in `<main>'
irb(main):111> "axb".split /(?<=x{,1})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):111: invalid pattern in look-behind: /(?<=x{,1})/ (SyntaxError)
	from /usr/local/bin/irb:25:in `load'
	from /usr/local/bin/irb:25:in `<main>'
irb(main):112> "axb".split /(?<=x{1,2})/
/var/lib/gems/3.0.0/gems/irb-1.14.0/exe/irb:9:in `<top (required)>': (irb):112: invalid pattern in look-behind: /(?<=x{1,2})/ (SyntaxError)
	from /usr/local/bin/irb:25:in `load'
	from /usr/local/bin/irb:25:in `<main>'
irb(main):113> "axb".split /(?<=x{1})/
=> ["ax", "b"]
irb(main):114> "axb".split /(?<=x{1,1})/
=> ["ax", "b"]

Note

I have searched on the internet and, to my knowledge, this behavior is not intended. (This documentation page on regular expressions)[https://ruby-doc.org/core-3.1.0/doc/regexp_rdoc.html], for example, does not say anything about limitations specific to lookbehinds.

Updated by mame (Yusuke Endoh) about 12 hours ago

  • Status changed from Open to Feedback

This is currently an intended implementation limitation.

This behavior comes from the specifications of Onigmo, which Ruby's regular expression engine is based on. The Onigmo documentation states the following about look-behinds:

  (?<=subexp)        look-behind
  (?<!subexp)        negative look-behind

                     Subexp of look-behind must be fixed-width.
                     But top-level alternatives can be of various lengths.
                     ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.

https://github.com/k-takata/Onigmo/blob/1d7ee878b3e4a9e41bf9825c937ae6cf0a9cd68c/doc/RE#L267-L272

I'm hesitant about whether we should add Onigmo's detailed implementation specifics to the Ruby documentation. However, seeing that there's already a precedent for it, I've opened a PR for now.

https://github.com/ruby/ruby/pull/13857

Actions #2

Updated by mame (Yusuke Endoh) about 12 hours ago

  • Status changed from Feedback to Open
Actions #3

Updated by mame (Yusuke Endoh) about 9 hours ago

  • Status changed from Open to Closed

Applied in changeset git|b2fdd26417d1539014c7af499ab1f9b398eca4c0.


Lookbehind regexp must be fixed-length

Fixes [Bug #21507]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0