Project

General

Profile

Bug #3202

potential regression? \w in regex doesn't match umlauts anymore.

Added by antifuchs (Andreas Fuchs) over 9 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
[ruby-core:29792]

Description

=begin
I'm trying to match umlauts using \w in regular expressions. In 1.9.1-p243, this works:

$ cat bar.rb
# encoding: utf-8
puts "ä".encoding
puts /\w/u.encoding
puts ("ä" =~ /\w/u).inspect
$ ruby bar.rb
UTF-8
UTF-8
0
$ ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]

With p378, it doesn't match the a with diaeresis anymore:

$ ruby bar.rb
UTF-8
UTF-8
nil
$ ruby --version
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]

I'm seeing the same result in 1.9.2dev (2010-04-26 trunk 27503).

This is OS X 10.6, with the following locale settings:
$ locale
LANG="C"
LC_COLLATE="C"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

No setting of either LC_CTYPE, LANG, nor LC_ALL has any effect on the p378 result.

This unexpected difference in behavior leads me to believe that something changed for the worse between these two releases.
=end


Related issues

Is duplicate of Backport191 - Bug #3181: Possible regexp regression in 1.9.1-p378Rejected04/20/2010Actions

History

#1

Updated by mame (Yusuke Endoh) over 9 years ago

  • Status changed from Open to Rejected

=begin
Hi,

This is intended spec change.
See http://redmine.ruby-lang.org/issues/show/3181.

Thanks,

--
Yusuke Endoh mame@tsg.ne.jp
=end

Also available in: Atom PDF