Project

General

Profile

Bug #12577

Is '$' punctuation or not? Inconsistency between us-ascii and UTF-8

Added by duerst (Martin Dürst) 9 months ago. Updated 8 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.0dev (2016-07-09 trunk 55618) [x86_64-cygwin]
[ruby-core:76328]

Description

US-ASCII thinks '$' is punctuation. UTF-8 thinks it's not.

This means that the following two scripts:

# encoding: us-ascii
puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'

and

# encoding: utf-8
puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'

produce different results. It also means that the output from the single line script

puts '$' =~ /\p{Punct}/ ? 'match' : 'no match'

changed when we changed the default script encoding from US-ASCII to UTF-8.

This may be okay as it is, but I'm reporting it here to check what others think.

History

#1 [ruby-core:76331] Updated by nobu (Nobuyoshi Nakada) 9 months ago

  • Description updated (diff)

#2 [ruby-core:76432] Updated by naruse (Yui NARUSE) 8 months ago

  • Status changed from Open to Rejected

It's because of their specs as follows:

POSIX

punct
Define characters to be classified as punctuation characters.
In the POSIX locale, neither the nor any characters in classes alpha, digit, or cntrl shall be included.

In a locale definition file, no character specified for the keywords upper, lower, alpha, digit, cntrl, xdigit, or as the shall be specified.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07

Unicode

\p{gc=Punctuation} \p{gc=Symbol} -- \p{alpha}
http://unicode.org/reports/tr18/#punct

Also available in: Atom PDF