Actions
Bug #7501
closed\w in a regular expression doesn't match international characters
Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux]
Backport:
Description
When using regexp matching, \w doesn't match characters which are not in the English alphabet.
For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't.
This program demonstrates the bug:
encoding: utf-8¶
match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" )
puts match.to_s
match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters
puts match.to_s
match = /\w+/.match( "üäö" ) #some German characters
puts match.to_s
Expected output:¶
abcdefghijklmnopqrstuvwxyz
áéíóůúýžščřďťň
üäö
Actual output:¶
abcdefghijklmnopqrstuvwxyz
Updated by Anonymous over 11 years ago
/[[:alpha:]]+/ should behave as you expect
Updated by shyouhei (Shyouhei Urabe) over 11 years ago
- Status changed from Open to Rejected
If I remember correctly this is an intentional design. Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time. It is hard for us to follow that.
Actions
Like0
Like0Like0