Bug #14137: Windows / MinGW - Regexp - Character Properties - General Category - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14137

closed

Windows / MinGW - Regexp - Character Properties - General Category

Added by MSP-Greg (Greg L) over 7 years ago. Updated almost 4 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 2.5.0dev (2017-11-28 trunk 60925) [x64-mingw32]

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN

[ruby-core:83925]

Tags:

regexp, win

Description

While testing RDoc on Appveyor, and the recently 'added' literals.kpeg file, I had several errors across Ruby versions 2.2 thru trunk.

It seems that the \p{} constructs listed here under 'General Category' generate an invalid character property name {**} error for many of the listed constructs.

Conversely, the constructs listed previously (eg \p{Alpha}, \p{Lower}, \p{Space}, etc) seem to work.

I briefly looked at the regexp tests, and they don't seem to test these.

Are these unavailable on Windows?

Actions

Copy link

#1 [ruby-core:83949]

Updated by duerst (Martin Dürst) over 7 years ago

There is a C preprocessor flag USE_UNICODE_PROPERTIES that is used e.g. in enc/unicode/10.0.0/name2ctype.h. I have never actually seen this, but it may be possible that your version of Ruby is compiled without this flag on. I don't see any reason why this should be Windows-specific; these properties are useful independent of the OS.

Actions

Copy link

#2 [ruby-core:104353]

Updated by jeremyevans0 (Jeremy Evans) about 4 years ago

Status changed from Open to Closed

I tested this using RubyInstaller versions on Windows. This appears related to regexp encoding, and not a bug, with the same behavior between Ruby 2.0 and 3.0:

C:\>c:\Ruby30-x64\bin\ruby -e "p(/\p{L}/.match('a'))"
-e:1: invalid character property name {L}: /\p{L}/

C:\>c:\Ruby30-x64\bin\ruby -e "p(/\p{L}/u.match('a'))"
#<MatchData "a">

C:\>c:\Ruby30-x64\bin\ruby -Ku -e "p(/\p{L}/.match('a'))"
#<MatchData "a">

C:\>c:\Ruby200-x64\bin\ruby -e "p(/\p{L}/.match('a'))"
-e:1: invalid character property name {L}: /\p{L}/

C:\>c:\Ruby200-x64\bin\ruby -e "p(/\p{L}/u.match('a'))"
#<MatchData "a">

C:\>c:\Ruby200-x64\bin\ruby -Ku -e "p(/\p{L}/.match('a'))"
#<MatchData "a">

The documentation for this feature (https://docs.ruby-lang.org/en/master/doc/regexp_rdoc.html#label-Character+Properties) says: A Unicode character's General Category value can also be matched, which I think implies this should only work for Unicode regexps, and not other regexps. So I think the current behavior is expected and not a bug.

Actions

Copy link

#3 [ruby-core:104773]

Updated by duerst (Martin Dürst) almost 4 years ago

I agree with @jeremyevans0 (Jeremy Evans), but would like to add that

ruby -e 'p (/\p{L}/.match("a"))'

will produce #<MatchData "a"> also in any situation that is using UTF-8. That will be on almost all current Linux/Unix,... versions, and also on Windows if you first use the command chcp 65001.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14137

Windows / MinGW - Regexp - Character Properties - General Category

Updated by duerst (Martin Dürst) over 7 years ago

Updated by jeremyevans0 (Jeremy Evans) about 4 years ago

Updated by duerst (Martin Dürst) almost 4 years ago