Project

General

Profile

Actions

Bug #19379

closed

Regex: "end pattern with unmatched parenthesis" with Ruby 3.2 and interpolation

Added by renchap (Renaud Chaput) about 1 year ago. Updated about 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
[ruby-core:112048]

Description

Sample code:

r2 = %r{#c-\w+/comment/[\w-]+}
%r{https?://[^/]+#{r2}}x

This works with Ruby 3.1:

irb(main):001:0> r2 = %r{#c-\w+/comment/[\w-]+}
irb(main):002:0> %r{https?://[^/]+#{r2}}x
=> /https?:\/\/[^\/]+(?-mix:#c-\w+\/comment\/[\w-]+)/x

But fails with Ruby 3.2.0:

irb(main):022:0> r2 = %r{#c-\w+/comment/[\w-]+}
irb(main):023:0> %r{https?://[^/]+#{r2}}x
(irb):23:in `<main>': end pattern with unmatched parenthesis: /https?:\/\/[^\/]+(?-mix:#c-\w+\/comment\/[\w-]+)/x (RegexpError)

But if I dont use interpolation, it works correctly:

irb(main):001:0> %r{https?://[^/]+#c-\w+/comment/[\w-]+}x
=> /https?:\/\/[^\/]+#c-\w+\/comment\/[\w-]+/x

Updated by znz (Kazuhiro NISHIYAMA) about 1 year ago

% docker run --platform linux/amd64 --rm -it ghcr.io/ruby/all-ruby env ALL_RUBY_SINCE=ruby-3.0 ./all-ruby -e 'r=/#/;p /#{r}/x'   
ruby-3.0.0          /(?-mix:#)/x
...
ruby-3.2.0-preview1 /(?-mix:#)/x
ruby-3.2.0-preview2 -e:1:in `<main>': end pattern with unmatched parenthesis: /(?-mix:#)/x (RegexpError)
                exit 1
...
ruby-3.2.0          -e:1:in `<main>': end pattern with unmatched parenthesis: /(?-mix:#)/x (RegexpError)
                exit 1

Updated by znz (Kazuhiro NISHIYAMA) about 1 year ago

I think minimal case is /(?-x:#)/x.

Updated by znz (Kazuhiro NISHIYAMA) about 1 year ago

  • Assignee set to make_now_just (Hiroya Fujinami)
Actions #4

Updated by znz (Kazuhiro NISHIYAMA) about 1 year ago

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED

Updated by mame (Yusuke Endoh) about 1 year ago

  • Assignee deleted (make_now_just (Hiroya Fujinami))

I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

mame (Yusuke Endoh) wrote in #note-5:

I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?

I agree. #18294 doesn't handle /(?-x:...)/ inside an extended regular expression as non-extended syntax. I'll see if I can fix it today.

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

jeremyevans0 (Jeremy Evans) wrote in #note-6:

mame (Yusuke Endoh) wrote in #note-5:

I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?

I agree. #18294 doesn't handle /(?-x:...)/ inside an extended regular expression as non-extended syntax. I'll see if I can fix it today.

Should be fixed by https://github.com/ruby/ruby/pull/7192

Actions #8

Updated by jeremyevans (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a.


Fix parsing of regexps that toggle extended mode on/off inside regexp

This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit
didn't handle cases where extended mode was turned on/off inside the
regexp. There are two ways to turn extended mode on/off:

/(?-x:#y)#z
/x =~ '#y'

/(?-x)#y(?x)#z
/x =~ '#y'

These can be nested inside the same regexp:

/(?-x:(?x)#x
(?-x)#y)#z
/x =~ '#y'

As you can probably imagine, this makes handling these regexps
somewhat complex. Due to the nesting inside portions of regexps,
the unassign_nonascii function needs to be recursive. In
recursive mode, it needs to track both opening and closing
parentheses, similar to how it already tracked opening and
closing brackets for character classes.

When scanning the regexp and coming to (? not followed by #,
scan for options, and use x and i to determine whether to
turn on or off extended mode. For :, indicting only the
current regexp section should have the extended mode
switched, recurse with the extended mode set or unset. For ),
indicating the remainder of the regexp (or current regexp portion
if already recursing) should turn extended mode on or off, just
change the extended mode flag and keep scanning.

While testing this, I noticed that a, d, and u are accepted
as options, in addition to i, m, and x, but I can't see
where those options are documented. I'm not sure whether or not
handling a, d, and u as options is a bug.

Fixes [Bug #19379]

Updated by naruse (Yui NARUSE) about 1 year ago

  • Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE

ruby_3_2 ca75332f46c39804e06cd37c2608cbdef0aebf05 merged revision(s) eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0