Bug #3838
closedregexp for unicode property under windows
Description
=begin
•Ruby 1.9.2-p0 RubyInstaller (md5: 21bf42f7ec4b8a831c947d656509cddb) Stable version
such regexp will cause an error: /\p{Lu}/
irb(main):002:0> /\p{Han}/
SyntaxError: (irb):2: invalid character property name {Han}: /\p{Han}/
from C:/Ruby192/bin/irb:12:in <main>' irb(main):003:0> /\p{Lu}/ SyntaxError: (irb):3: invalid character property name {Lu}: /\p{Lu}/ from C:/Ruby192/bin/irb:12:in
'
irb(main):004:0>
while this is all right: /\p{Alpha}/
irb(main):001:0> /\p{Alpha}/
=> /\p{Alpha}/
=end
Updated by naruse (Yui NARUSE) over 14 years ago
- Category set to M17N
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
=begin
=end
Updated by naruse (Yui NARUSE) over 14 years ago
=begin
\p{Lu} and \p{Han} is Unicode Property for Unicode regexps.
Where the locale is not UTF-8, the encoding of regexp literal given from irb is that encoding.
It means the regexp literal's encoding is not UTF-8 (Windows-1252 for example on English version of Windows).
You can avoid this problem by explicitly specify the encoding as UTF-8 by /u modifier like:
% echo $LANG
C
% ~/local/ruby/bin/irb
irb(main):001:0> /\p{Lu}/
SyntaxError: (irb):1: invalid character property name {Lu}: /\p{Lu}/
from /home/naruse/local/ruby/bin/irb:12:in `'
irb(main):002:0> /\p{Lu}/u
=> /\p{Lu}/
=end
Updated by naruse (Yui NARUSE) over 14 years ago
- Status changed from Assigned to Rejected
=begin
=end