Project

General

Profile

Actions

Bug #10097

closed

Case-insensitive Regexp matching for Windows-1252 not working for ŠšŽžŒœÿŸ

Added by duerst (Martin Dürst) almost 10 years ago. Updated over 8 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
1.9.3p545
[ruby-core:64049]

Description

By chance I had a look at enc/iso_8859_1.c and found

ENC_REPLICATE("Windows-1252", "ISO-8859-1")

on line 288. But this does not work for case folding:

# http://en.wikipedia.org/wiki/Windows-1252
s1 = "\u0160".encode 'windows-1252' # 'Š'
r1 = Regexp.new("\u0161".encode('windows-1252'), Regexp::IGNORECASE) # /š/i
s1 =~ r1
   # => nil
s2 = "\u0178".encode 'windows-1252' # 'Ÿ'
r2 = Regexp.new("\u00FF".encode('windows-1252'), Regexp::IGNORECASE) # /ÿ/i
s2 =~ r2
   # => nil
s3 = "\u00C0".encode 'windows-1252' # 'À'
r3 = Regexp.new("\u00E0".encode('windows-1252'), Regexp::IGNORECASE) # /à/i
s3 =~ r3
   # => 0

So case-insensitive matching works when both characters are in iso-8859-1, but not when one (ÿŸ) or both (ŠšŽžŒœ) characters are not in iso-8859-1.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0