Project

General

Profile

Actions

Bug #19417

closed

Regexp \p{Word} and [[:word:]] do not match Unicode Other_Number character

Added by ObjectBoxPC (Philip Chung) about 1 year ago. Updated 5 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:112223]

Description

According to the documentation for Regexp, \p{Word} and [[:word:]] both match a character in one of the following Unicode general categories: Letter, Mark, Number, Connector_Punctuation. However, neither matches U+00B2, which is in the Other_Number category (which is a subcategory of Number).

puts "Ruby version: %s" % RUBY_VERSION
puts "\p{Word} matches? %s" % /\p{Word}/u.match?("\u00B2")
puts "[[:word:]] matches? %s" % /[[:word:]]/u.match?("\u00B2")
puts "Is a Number charater? %s" % /\p{Number}/u.match?("\u00B2")
puts "Is an Other_Number character? %s" % /\p{Other_Number}/u.match?("\u00B2")

Expected output:

Ruby version: 3.2.0
p{Word} matches? true
[[:word:]] matches? true
Is a Number charater? true
Is an Other_Number character? true

Actual output:

Ruby version: 3.2.0
p{Word} matches? false
[[:word:]] matches? false
Is a Number charater? true
Is an Other_Number character? true

I notice that the upstream Onigmo library doc defines the [[:word:]] class as "Letter | Mark | Decimal_Number | Connector_Punctuation", meaning that it only matches certain number characters (which would exclude U+00B2). I am not sure how \p{Word} is defined though. But perhaps the documentation needs to be changed?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like2Like0Like0