Bug #20990: Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20990

closed

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Bug #20990: Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Added by tompng (tomoya ishida) 12 months ago. Updated 12 months ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +MN [arm64-darwin22]

Backport:

3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:120435]

Description

IRB crashes when a code is tokenized to an invalid byte sequence.

Ripper.tokenize '"\C-\あ"'
#=> ["\"", "\\C-\\\xE3\x81", "\x82", "\""]

I think the error evaluating "\C-\あ" should be Invalid escape character syntax just like "\C-あ"

$ ./ruby --parser=parse.y -e '"\C-あ"'
-e:1: Invalid escape character syntax
"\C-あ"

$ ./ruby --parser=parse.y -e '"\C-\あ"'
-e:1: invalid multibyte char (UTF-8)
-e:1: invalid multibyte char (UTF-8)
./ruby: compile error (SyntaxError)

History
Notes
Property changes
Associated revisions

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20990

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Updated by tompng (tomoya ishida) 12 months ago Actions
Copy link
#1 [ruby-core:120438]

Updated by nobu (Nobuyoshi Nakada) 12 months ago Actions
Copy link
#2

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20990

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Updated by tompng (tomoya ishida) 12 months ago ActionsCopy link #1 [ruby-core:120438]

Updated by nobu (Nobuyoshi Nakada) 12 months ago ActionsCopy link #2

Updated by tompng (tomoya ishida) 12 months ago Actions
Copy link
#1 [ruby-core:120438]

Updated by nobu (Nobuyoshi Nakada) 12 months ago Actions
Copy link
#2