Project

General

Profile

Bug #3181

Possible regexp regression in 1.9.1-p378

Added by gettalong (Thomas Leitner) over 9 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
[ruby-core:29658]

Description

=begin
Hi,

there seems to be some sort of regression in 1.9.1-p378 regarding regular expressions and I18N support.

Example program:

# -- coding: utf-8 --

puts "Should match"
md = /\w/u.match("üäß")
p md.to_a

Example output:

jruby 1.5.0.RC1 (ruby 1.8.7 patchlevel 249) (2010-04-14 0b08bc7) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]
Should match at index 0
["\303\274"]
rubinius 1.0.0-rc4 (1.8.7 release 2010-03-31 JI) [x86_64-apple-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.8.6 (2010-02-05 patchlevel 399) [i686-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]
Should match at index 0
["ü"]
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
Should match at index 0
[]
ruby 1.9.2dev (2009-07-18 trunk 24186) [i386-darwin10.2.0]
Should match at index 0
["ü"]

Best regards,
Thomas
=end


Related issues

Has duplicate Backport191 - Bug #3202: potential regression? \w in regex doesn't match umlauts anymore.Rejected04/27/2010Actions

History

#1

Updated by naruse (Yui NARUSE) over 9 years ago

  • Category set to core
  • Status changed from Open to Rejected

=begin
It is intended, \s/\d/\w is ASCII after 1.9.1p378 and 1.9.2.
(ruby 1.9.2dev (2009-07-18 trunk 24186) is too old, current trunk is ASCII)
This is because many codes which use \s and \d doesn't work on 1.9, so it was judged as a bug.

Anyway I'm interesting in real usage of \w in UTF-8 context, can you show the real example?
=end

#2

Updated by rogerdpack (Roger Pack) about 9 years ago

=begin

Anyway I'm interesting in real usage of \w in UTF-8 context, can you show the real example?

Here's some related questions/uses, I believe:

http://stackoverflow.com/questions/3576232/how-to-match-unicode-words-with-ruby-1-9
http://www.ruby-forum.com/topic/208777

Though too late to change it now :)
=end

Also available in: Atom PDF