Bug #17340
closed/\p{/ matches newline instead of throwing syntax error
Description
The regular expression /\p{/
matches newline characters instead of reporting a syntax error.
irb(main):001:0> /\p{/.match("\n")
=> #<MatchData "\n">
The issue stems from the function fetch_char_property_to_ctype
in regparse.c
. If the Unicode character property escape is not terminated with a right brace or some of the other unacceptable characters, the method will return 0 and will not be considered a failure. The number 0 is then interpreted as a ctype
code which stands for the newline character property. Thus, this expression will end up matching newlines. I would guess that the intended behavior here would be to report a syntax error in the regular expression.
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
It turns out the regexp behavior depends on the encoding:
$ ruby -ve 'p(/\p{/u.match("\n"))'
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-openbsd]
#<MatchData "\n">
$ ruby -ve 'p(/\p{/n.match("\n"))'
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-openbsd]
-e:1: internal parser error (bug): /\p{/
I agree with you about the appropriate place to fix this. I've submitted a pull request to fix it: https://github.com/ruby/ruby/pull/3807
Updated by jirkamarsik (Jirka Marsik) over 3 years ago
Great, thanks!
Updated by jeremyevans (Jeremy Evans) over 3 years ago
- Status changed from Open to Closed
Applied in changeset git|b26d6c70e0f08050ca23388bb0e8442f73269c73.
Detect the premature end of char property in regexp
Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in
fetch_char_property_to_ctype and only set otherwise if an ending
} is found.
Fixes [Bug #17340]