Project

General

Profile

Bug #3838

regexp for unicode property under windows

Added by ntys (ding ding) over 9 years ago. Updated about 9 years ago.

Status:
Rejected
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.2p0 (2010-08-18) [i386-mingw32]
Backport:
[ruby-core:32419]

Description

=begin
•Ruby 1.9.2-p0 RubyInstaller (md5: 21bf42f7ec4b8a831c947d656509cddb) Stable version

such regexp will cause an error: /\p{Lu}/

irb(main):002:0> /\p{Han}/
SyntaxError: (irb):2: invalid character property name {Han}: /\p{Han}/
from C:/Ruby192/bin/irb:12:in <main>'
irb(main):003:0> /\p{Lu}/
SyntaxError: (irb):3: invalid character property name {Lu}: /\p{Lu}/
from C:/Ruby192/bin/irb:12:in
'
irb(main):004:0>

while this is all right: /\p{Alpha}/

irb(main):001:0> /\p{Alpha}/
=> /\p{Alpha}/
=end

#1

Updated by naruse (Yui NARUSE) over 9 years ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)

=begin

=end

#2

Updated by naruse (Yui NARUSE) over 9 years ago

=begin
\p{Lu} and \p{Han} is Unicode Property for Unicode regexps.

Where the locale is not UTF-8, the encoding of regexp literal given from irb is that encoding.
It means the regexp literal's encoding is not UTF-8 (Windows-1252 for example on English version of Windows).

You can avoid this problem by explicitly specify the encoding as UTF-8 by /u modifier like:

% echo $LANG
C
% ~/local/ruby/bin/irb
irb(main):001:0> /\p{Lu}/
SyntaxError: (irb):1: invalid character property name {Lu}: /\p{Lu}/
from /home/naruse/local/ruby/bin/irb:12:in `'
irb(main):002:0> /\p{Lu}/u
=> /\p{Lu}/
=end

#3

Updated by naruse (Yui NARUSE) over 9 years ago

  • Status changed from Assigned to Rejected

=begin

=end

Also available in: Atom PDF