Bug #2822
Russian characters are missing from word characters types in Regexp
| Status: | Closed | Start date: | 02/27/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | core | |||
| Target version: | 1.9.2 | |||
| ruby -v: | ruby 1.9.2dev (2010-02-27 trunk 26772) [i686-linux] |
Description
"Hello".match(/[\w]*/) => #<MatchData "Hello"> "Привет".match(/[\w]*/) => #<MatchData ""> "Привет".match(/[А-Яа-яЁё\w]*/) => #<MatchData "Привет"> Non word character type \W behaves similar.
History
Updated by Eregon (Benoit Daloze) about 2 years ago
$ ri Regexp
/\w/ - A word character ([a-zA-Z0-9_])
/[[:word:]]/ - A character in one of the following Unicode
general categories Letter, Mark, Number,
Connector_Punctuation<i/i>
/\p{Word}/ - A member of one of the following Unicode general
category Letter, Mark, Number, Connector_Punctuation
> "aér".match /\w+/
=> #<MatchData "a">
> "aér".match /[[:word:]]+/
=> #<MatchData "aér">
> "aér".match /\p{Word}+/
=> #<MatchData "aér">
The documentation of Regexp is awesome in Ruby 1.9, have a look ;)
Updated by naruse (Yui NARUSE) about 2 years ago
- Status changed from Open to Closed