Bug #2822

Russian characters are missing from word characters types in Regexp

Added by stas (Stas Senotrusov) about 2 years ago. Updated about 1 year ago.

[ruby-core:28354]
Status:Closed Start date:02/27/2010
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:core
Target version:1.9.2
ruby -v:ruby 1.9.2dev (2010-02-27 trunk 26772) [i686-linux]

Description

"Hello".match(/[\w]*/)
=> #<MatchData "Hello">

"Привет".match(/[\w]*/)
=> #<MatchData "">

"Привет".match(/[А-Яа-яЁё\w]*/)
=> #<MatchData "Привет">

Non word character type \W behaves similar.

History

Updated by Eregon (Benoit Daloze) about 2 years ago

$ ri Regexp
/\w/ - A word character ([a-zA-Z0-9_])

/[[:word:]]/ - A character in one of the following Unicode 
         general categories Letter, Mark, Number, 
         Connector_Punctuation<i/i>

/\p{Word}/ - A member of one of the following Unicode general 
         category Letter, Mark, Number, Connector_Punctuation


> "aér".match /\w+/
=> #<MatchData "a">
> "aér".match /[[:word:]]+/
=> #<MatchData "aér">
> "aér".match /\p{Word}+/
=> #<MatchData "aér">

The documentation of Regexp is awesome in Ruby 1.9, have a look ;)

Updated by naruse (Yui NARUSE) about 2 years ago

  • Status changed from Open to Closed

Also available in: Atom PDF