Project

General

Profile

Bug #11859

Updated by matsui (Kimihito Matsui) almost 9 years ago

U+FF21 (A, FULLWIDTH LATIN CAPITAL LETTER A) and U+00c0 (À, LATIN CAPITAL LETTER A WITH GRAVE) is `Uppercase_Letter` @Uppercase_Letter@ so it should be match and return 0 in following case but this returns 1. 

 ~~~ <pre> 
 ruby -e 'puts "\uFF21A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP”))' # => 1 
 ruby -e 'puts "\u00C0A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP"))’ # => 1 
 ~~~ 
 </pre> 

 This also happens in lower case matching. 

 ~~~ 
 <pre> 
 ruby -e 'puts "\uFF41a".encode("EUC-JP") =~ Regexp.compile("\\\p{Lower}".encode("EUC-JP"))’ #=> 1 
 ~~~ </pre> 

 In Unicode encoding it works as follows. 

 ~~~ 
 <pre> 
 ruby -e 'puts "\uFF21A" =~ Regexp.compile("\\\p{Upper}")'    # => 0 
 ~~~ </pre> 
 Looks like EUC-JP `\p{Upper}` @\p{Upper}@ and `\p{Lower}` @\p{Lower}@ regex is limited to ASCII characters.

Back