Actions
Bug #19417
closedRegexp \p{Word} and [[:word:]] do not match Unicode Other_Number character
Description
According to the documentation for Regexp, \p{Word}
and [[:word:]]
both match a character in one of the following Unicode general categories: Letter, Mark, Number, Connector_Punctuation. However, neither matches U+00B2, which is in the Other_Number category (which is a subcategory of Number).
puts "Ruby version: %s" % RUBY_VERSION
puts "\p{Word} matches? %s" % /\p{Word}/u.match?("\u00B2")
puts "[[:word:]] matches? %s" % /[[:word:]]/u.match?("\u00B2")
puts "Is a Number charater? %s" % /\p{Number}/u.match?("\u00B2")
puts "Is an Other_Number character? %s" % /\p{Other_Number}/u.match?("\u00B2")
Expected output:
Ruby version: 3.2.0
p{Word} matches? true
[[:word:]] matches? true
Is a Number charater? true
Is an Other_Number character? true
Actual output:
Ruby version: 3.2.0
p{Word} matches? false
[[:word:]] matches? false
Is a Number charater? true
Is an Other_Number character? true
I notice that the upstream Onigmo library doc defines the [[:word:]]
class as "Letter | Mark | Decimal_Number | Connector_Punctuation", meaning that it only matches certain number characters (which would exclude U+00B2). I am not sure how \p{Word}
is defined though. But perhaps the documentation needs to be changed?
Actions
Like0
Like0Like0Like2Like0Like0