Bug #18013: Unexpected results when mxiing negated character classes and case-folding - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18013

open

Unexpected results when mxiing negated character classes and case-folding

Bug #18013: Unexpected results when mxiing negated character classes and case-folding

Added by jirkamarsik (Jirka Marsik) about 5 years ago. Updated about 5 years ago.

Status:

Open

Assignee:

Target version:

ruby -v:

ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]

Backport:

2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN

[ruby-core:104436]

Description

irb(main):001:0> /[^a-c]/i.match("A")
=> nil
irb(main):002:0> /[[^a-c]]/i.match("A")
=> #<MatchData "A">

The two regular expressions above match different strings, because the character classes denote different sets of characters. In order for /[^a-c]/i to produce correct results, Oniguruma provided a fix that can still be easily seen in the code as it is hidden behind an always-on preprocessor flag (CASE_FOLD_IS_APPLIED_INSIDE_NEGATIVE_CCLASS, https://github.com/ruby/ruby/blob/9eae8cdefba61e9e51feb30a4b98525593169666/regparse.c#L5528). The idea of the fix is to first case-fold a character class and only then apply the negation (essentially moving the case-fold operator inside the negation).

In the case of our first regular expression, [a-c] is case-folded into [a-cA-C] and that is then inverted into [^a-cA-C], which is the expected result. However, this case-folding logic is currently only being applied to the top-most character class and so if we use a nested negated character class, the order of the operations will be switched.

With our second regular expression, [a-c] will first be negated to yield [^a-c], which will then be case-folded into ., the set of all characters (since [^a-c] contains A-C, which case-fold into a-c).

A way to fix this would be to apply case-folding for nested character classes as well, so that the nested character classes behave the same as the top-most character class. Then, we would get the same semantics for both expressions.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #18013

Unexpected results when mxiing negated character classes and case-folding

Updated by jirkamarsik (Jirka Marsik) about 5 years ago Actions
Copy link
#1 [ruby-core:104437]

Updated by duerst (Martin Dürst) about 5 years ago Actions
Copy link
#2 [ruby-core:104439]

Updated by jirkamarsik (Jirka Marsik) about 5 years ago Actions
Copy link
#3 [ruby-core:104440]

Project

General

Profile

Ruby

Custom queries

Bug #18013

Unexpected results when mxiing negated character classes and case-folding

Updated by jirkamarsik (Jirka Marsik) about 5 years ago ActionsCopy link #1 [ruby-core:104437]

Updated by duerst (Martin Dürst) about 5 years ago ActionsCopy link #2 [ruby-core:104439]

Updated by jirkamarsik (Jirka Marsik) about 5 years ago ActionsCopy link #3 [ruby-core:104440]

Updated by jirkamarsik (Jirka Marsik) about 5 years ago Actions
Copy link
#1 [ruby-core:104437]

Updated by duerst (Martin Dürst) about 5 years ago Actions
Copy link
#2 [ruby-core:104439]

Updated by jirkamarsik (Jirka Marsik) about 5 years ago Actions
Copy link
#3 [ruby-core:104440]