Bug #12577
closedIs '$' punctuation or not? Inconsistency between us-ascii and UTF-8
Description
US-ASCII thinks '$' is punctuation. UTF-8 thinks it's not.
This means that the following two scripts:
# encoding: us-ascii
puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'
and
# encoding: utf-8
puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'
produce different results. It also means that the output from the single line script
puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'
changed when we changed the default script encoding from US-ASCII to UTF-8.
This may be okay as it is, but I'm reporting it here to check what others think.
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
- Description updated (diff)
Updated by naruse (Yui NARUSE) over 8 years ago
- Status changed from Open to Rejected
It's because of their specs as follows:
POSIX
punct
Define characters to be classified as punctuation characters.
In the POSIX locale, neither the nor any characters in classes alpha, digit, or cntrl shall be included.In a locale definition file, no character specified for the keywords upper, lower, alpha, digit, cntrl, xdigit, or as the shall be specified.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07
Unicode
\p{gc=Punctuation} \p{gc=Symbol} -- \p{alpha}
http://unicode.org/reports/tr18/#punct