Actions
Bug #11859
closedRegexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.
Bug #11859:
Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.
Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
Description
U+FF21 (A, FULLWIDTH LATIN CAPITAL LETTER A) and U+00c0 (À, LATIN CAPITAL LETTER A WITH GRAVE) is Uppercase_Letter
so it should match and return 0 in following case but this returns 1.
ruby -e 'puts "\uFF21A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP”))' # => 1
ruby -e 'puts "\u00C0A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP"))’ # => 1
This also happens in lower case matching.
ruby -e 'puts "\uFF41a".encode("EUC-JP") =~ Regexp.compile("\\\p{Lower}".encode("EUC-JP"))’ #=> 1
In Unicode encoding it works as follows.
ruby -e 'puts "\uFF21A" =~ Regexp.compile("\\\p{Upper}")' # => 0
Looks like EUC-JP \p{Upper}
and \p{Lower}
regex is limited to ASCII characters.
Actions