Project

General

Profile

Actions

Bug #20990

closed

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Added by tompng (tomoya ishida) 8 days ago. Updated 8 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +MN [arm64-darwin22]
[ruby-core:120435]

Description

IRB crashes when a code is tokenized to an invalid byte sequence.

Ripper.tokenize '"\C-\あ"'
#=> ["\"", "\\C-\\\xE3\x81", "\x82", "\""]

I think the error evaluating "\C-\あ" should be Invalid escape character syntax just like "\C-あ"

$ ./ruby --parser=parse.y -e '"\C-あ"'
-e:1: Invalid escape character syntax
"\C-あ"

$ ./ruby --parser=parse.y -e '"\C-\あ"'
-e:1: invalid multibyte char (UTF-8)
-e:1: invalid multibyte char (UTF-8)
./ruby: compile error (SyntaxError)
Actions

Also available in: Atom PDF

Like0
Like0Like0