Bug #20990: Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20990

closed

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Added by tompng (tomoya ishida) 7 months ago. Updated 7 months ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +MN [arm64-darwin22]

Backport:

3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:120435]

Description

IRB crashes when a code is tokenized to an invalid byte sequence.

Ripper.tokenize '"\C-\あ"'
#=> ["\"", "\\C-\\\xE3\x81", "\x82", "\""]

I think the error evaluating "\C-\あ" should be Invalid escape character syntax just like "\C-あ"

$ ./ruby --parser=parse.y -e '"\C-あ"'
-e:1: Invalid escape character syntax
"\C-あ"

$ ./ruby --parser=parse.y -e '"\C-\あ"'
-e:1: invalid multibyte char (UTF-8)
-e:1: invalid multibyte char (UTF-8)
./ruby: compile error (SyntaxError)

Actions

Copy link

#1 [ruby-core:120438]

Updated by tompng (tomoya ishida) 7 months ago

Pull request: https://github.com/ruby/ruby/pull/12484

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) 7 months ago

Status changed from Open to Closed

Applied in changeset git|e4ec2128ae9c5c2a43cd599759f19db21fc0238f.

[Bug #20990] Reject escaped multibyte char with control/meta prefix

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20990

Ripper.tokenize splits `"\C-\あ"` into tokens with invalid byte sequence

Updated by tompng (tomoya ishida) 7 months ago

Updated by nobu (Nobuyoshi Nakada) 7 months ago