Project

General

Profile

Actions

Bug #18010

open

Character class with single character gets case-folded with following string

Added by jirkamarsik (Jirka Marsik) 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
[ruby-core:104423]

Description

irb(main):001:0> /ff/i.match("\ufb00")
=> #<MatchData "ff">
irb(main):002:0> /[f]f/i.match("\ufb00")
=> #<MatchData "ff">
irb(main):003:0> /f[f]/i.match("\ufb00")
=> nil
irb(main):004:0> /[f][f]/i.match("\ufb00")
=> nil
irb(main):005:0> /(?:f)f/i.match("\ufb00")
=> nil
irb(main):006:0> /f(?:f)/i.match("\ufb00")
=> nil
irb(main):007:0> /(?:f)(?:f)/i.match("\ufb00")
=> nil

In the above, singleton character classes ([...]) and even parentheses ((?:...)) break up string literals, forcing each separate substring to be matched against separately. However, in the one case when a singleton character class precedes a string, it is joined with it as an optimization. However, this optimization ends up changing the semantics of the Regexp.


Related issues

Related to Ruby master - Bug #17989: Case insensitive Regexps do not handle characters with overlapping case foldingsOpenActions
Actions #1

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

  • Related to Bug #17989: Case insensitive Regexps do not handle characters with overlapping case foldings added
Actions

Also available in: Atom PDF