\w in a regular expression doesn't match international characters
When using regexp matching, \w doesn't match characters which are not in the English alphabet.
For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't.
This program demonstrates the bug:
match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" )
match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters
match = /\w+/.match( "üäö" ) #some German characters
#2 [ruby-core:50537] Updated by Shyouhei Urabe almost 4 years ago
- Status changed from Open to Rejected
If I remember correctly this is an intentional design. Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time. It is hard for us to follow that.