Project

General

Profile

Actions

Bug #17340

closed

/\p{/ matches newline instead of throwing syntax error

Added by jirkamarsik (Jirka Marsik) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
[ruby-core:101028]

Description

The regular expression /\p{/ matches newline characters instead of reporting a syntax error.

irb(main):001:0> /\p{/.match("\n")
=> #<MatchData "\n">

The issue stems from the function fetch_char_property_to_ctype in regparse.c. If the Unicode character property escape is not terminated with a right brace or some of the other unacceptable characters, the method will return 0 and will not be considered a failure. The number 0 is then interpreted as a ctype code which stands for the newline character property. Thus, this expression will end up matching newlines. I would guess that the intended behavior here would be to report a syntax error in the regular expression.

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

It turns out the regexp behavior depends on the encoding:

$ ruby -ve 'p(/\p{/u.match("\n"))'
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-openbsd]
#<MatchData "\n">

$ ruby -ve 'p(/\p{/n.match("\n"))'
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-openbsd]
-e:1: internal parser error (bug): /\p{/

I agree with you about the appropriate place to fix this. I've submitted a pull request to fix it: https://github.com/ruby/ruby/pull/3807

Actions #3

Updated by jeremyevans (Jeremy Evans) over 3 years ago

  • Status changed from Open to Closed

Applied in changeset git|b26d6c70e0f08050ca23388bb0e8442f73269c73.


Detect the premature end of char property in regexp

Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in
fetch_char_property_to_ctype and only set otherwise if an ending
} is found.

Fixes [Bug #17340]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0