Bug #14367: Wrong interpretation of backslash C in regexp literals - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14367

closed

Wrong interpretation of backslash C in regexp literals

Bug #14367: Wrong interpretation of backslash C in regexp literals

Added by shyouhei (Shyouhei Urabe) over 8 years ago. Updated almost 5 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 2.6.0dev (2018-01-16 trunk 61875) [x86_64-darwin15]

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN

[ruby-core:84900]

Tags:

regexp

Description

Following ruby code returns nil.

% LC_ALL=C ruby -ve 'p(/\c\xFF/ =~ "\c\xFF")'
ruby 2.6.0dev (2018-01-16 trunk 61875) [x86_64-darwin15]
nil

Is this intentional?

Related issues 1 (0 open — 1 closed)

Updated by Hanmac (Hans Mackowiak) over 8 years ago Actions
Copy link
#1 [ruby-core:84904]

the problem is this:

/\c\xFF/.source == "\\c\\xFF"

which is already escaped

you might want this:

/#{"\c\xFF"}/ == /ƒ/

or use this:

Regexp.compile("\c\xFF")

PS: it is correct that i get this?

"\c\xFF" ==  "\x9F" #=> true

EDIT: this works

/\x9F/ =~ "\c\xFF" #=> 0

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#2 [ruby-core:84905]

Hanmac (Hans Mackowiak) wrote:

the problem is this:
/\c\xFF/.source == "\\c\\xFF"

No, I believe that isn't the problem. For instance /\c\x7F/ works.

% LC_ALL=C ruby -ve 'p(/\c\x7F/ =~ "\c\x7F")'
ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin15]
0

EDIT: this works
/\x9F/ =~ "\c\xFF" #=> 0

Yeah, that's why I titled this issue a "wrong interpretation of backslash C in regexp literals". This is about /...\c.../.

Updated by shyouhei (Shyouhei Urabe) almost 6 years ago Actions
Copy link
#3 [ruby-core:97994]

Can I have any answer for my question ("Is this intentional?")?

Updated by naruse (Yui NARUSE) almost 6 years ago Actions
Copy link
#4 [ruby-core:98181]

It looks inconsistency handling between regexp and Ruby's for \c\xff:

%  LC_ALL=C ruby -ve 'p (/\c\xff/ =~ "\x1f")'
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
0

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#5 [ruby-core:103807]

The behavior appears not to be intentional. This is a bug related to the fact that Ruby uses a recursive algorithm for strings (read_escape) but not for regexps (tokadd_escape). I've submitted a pull request to have control/meta handling for regexps use the same recursive algorithm used for strings, which fixes this issue: https://github.com/ruby/ruby/pull/4495

Updated by jeremyevans (Jeremy Evans) almost 5 years ago Actions
Copy link
#6

Status changed from Open to Closed

Applied in changeset git|11ae581a4a7f5d5f5ec6378872eab8f25381b1b9.

Fix handling of control/meta escapes in literal regexps

Ruby uses a recursive algorithm for handling control/meta escapes
in strings (read_escape). However, the equivalent code for regexps
(tokadd_escape) in did not use a recursive algorithm. Due to this,
Handling of control/meta escapes in regexp did not have the same
behavior as in strings, leading to behavior such as the following
returning nil:

/\c\xFF/ =~ "\c\xFF"

Switch the code for handling \c, \C and \M in literal regexps to
use the same code as for strings (read_escape), to keep behavior
consistent between the two.

Fixes [Bug #14367]

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago Actions
Copy link
#7 [ruby-core:103814]

Agree that the previous behavior might not be intentional, but 11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 also seems something broken on other than US-ASCII encoding.

$ LANG=en_US.UTF-8 ./ruby -vce '/\c\xFF/'
ruby 3.1.0dev (2021-05-13T01:55:43Z master 11ae581a4a) [x86_64-darwin19]
-e:1: invalid multibyte escape: /\x9F/
-e:1: warning: possibly useless use of a literal in void context

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#8 [ruby-core:103815]

nobu (Nobuyoshi Nakada) wrote in #note-7:

Agree that the previous behavior might not be intentional, but 11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 also seems something broken on other than US-ASCII encoding.
$ LANG=en_US.UTF-8 ./ruby -vce '/\c\xFF/'
ruby 3.1.0dev (2021-05-13T01:55:43Z master 11ae581a4a) [x86_64-darwin19]
-e:1: invalid multibyte escape: /\x9F/
-e:1: warning: possibly useless use of a literal in void context

The previous behavior also ended up with a regexp which matches a 8-bit character, so maybe Ruby should have given the same error before? Alternatively, I can revert if that is better?

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#9 [ruby-core:103836]

jeremyevans0 (Jeremy Evans) wrote in #note-8:

nobu (Nobuyoshi Nakada) wrote in #note-7:
Agree that the previous behavior might not be intentional, but 11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 also seems something broken on other than US-ASCII encoding.
$ LANG=en_US.UTF-8 ./ruby -vce '/\c\xFF/'
ruby 3.1.0dev (2021-05-13T01:55:43Z master 11ae581a4a) [x86_64-darwin19]
-e:1: invalid multibyte escape: /\x9F/
-e:1: warning: possibly useless use of a literal in void context
The previous behavior also ended up with a regexp which matches a 8-bit character, so maybe Ruby should have given the same error before? Alternatively, I can revert if that is better?

My previous statement was incorrect. The reason it worked before is that \c behavior in regexps was wrong and did not result in the 8-bit character it should have. If you used a character resulting in a high bit, you did get the same error:

$ LANG=en_US.UTF-8 ruby -vce '/\M-a/'
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-openbsd]
-e:1: too short escaped multibyte character: /\M-a/
-e:1: warning: possibly useless use of a literal in void context

You would also get an error if you created a regexp using a string instead of using a literal regexp:

$ LANG=en_US.UTF-8 ruby -ve '/#{s="\c\xff"}/'
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-openbsd]
-e:1: warning: possibly useless use of a literal in void context
-e:1:in `<main>': invalid multibyte character (ArgumentError)

So I don't think anything is broken on UTF-8 (or other encodings). Before, it should have raised an error and it didn't because the incorrect algorithm resulted in the wrong character. Now it raises an error as it should.

Updated by mame (Yusuke Endoh) over 4 years ago Actions
Copy link
#10

Related to Bug #18449: Bug in 3.1 regexp literals with \c added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #14367

Wrong interpretation of backslash C in regexp literals

Updated by Hanmac (Hans Mackowiak) over 8 years ago Actions
Copy link
#1 [ruby-core:84904]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#2 [ruby-core:84905]

Updated by shyouhei (Shyouhei Urabe) almost 6 years ago Actions
Copy link
#3 [ruby-core:97994]

Updated by naruse (Yui NARUSE) almost 6 years ago Actions
Copy link
#4 [ruby-core:98181]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#5 [ruby-core:103807]

Updated by jeremyevans (Jeremy Evans) almost 5 years ago Actions
Copy link
#6

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago Actions
Copy link
#7 [ruby-core:103814]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#8 [ruby-core:103815]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#9 [ruby-core:103836]

Updated by mame (Yusuke Endoh) over 4 years ago Actions
Copy link
#10

Project

General

Profile

Ruby

Custom queries

Bug #14367

Wrong interpretation of backslash C in regexp literals

Updated by Hanmac (Hans Mackowiak) over 8 years ago ActionsCopy link #1 [ruby-core:84904]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago ActionsCopy link #2 [ruby-core:84905]

Updated by shyouhei (Shyouhei Urabe) almost 6 years ago ActionsCopy link #3 [ruby-core:97994]

Updated by naruse (Yui NARUSE) almost 6 years ago ActionsCopy link #4 [ruby-core:98181]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago ActionsCopy link #5 [ruby-core:103807]

Updated by jeremyevans (Jeremy Evans) almost 5 years ago ActionsCopy link #6

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago ActionsCopy link #7 [ruby-core:103814]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago ActionsCopy link #8 [ruby-core:103815]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago ActionsCopy link #9 [ruby-core:103836]

Updated by mame (Yusuke Endoh) over 4 years ago ActionsCopy link #10

Updated by Hanmac (Hans Mackowiak) over 8 years ago Actions
Copy link
#1 [ruby-core:84904]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#2 [ruby-core:84905]

Updated by shyouhei (Shyouhei Urabe) almost 6 years ago Actions
Copy link
#3 [ruby-core:97994]

Updated by naruse (Yui NARUSE) almost 6 years ago Actions
Copy link
#4 [ruby-core:98181]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#5 [ruby-core:103807]

Updated by jeremyevans (Jeremy Evans) almost 5 years ago Actions
Copy link
#6

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago Actions
Copy link
#7 [ruby-core:103814]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#8 [ruby-core:103815]

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions
Copy link
#9 [ruby-core:103836]

Updated by mame (Yusuke Endoh) over 4 years ago Actions
Copy link
#10