Bug #20578: Tokenizing string literal that have newline and invalid escape is wrong - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20578

closed

Tokenizing string literal that have newline and invalid escape is wrong

Bug #20578: Tokenizing string literal that have newline and invalid escape is wrong

Added by tompng (tomoya ishida) almost 2 years ago. Updated almost 2 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 3.4.0dev (2024-06-13T09:49:46Z master 8b843b0dc7) [x86_64-linux]

Backport:

3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

[ruby-core:118314]

Description

Tokenizing string literal that have newline and invalid escape is wrong

When a string literal includes \n and an invalid escape after it, tokenize result gets wrong.

Ripper.tokenize "\"hello\\x world"
# => ["\"", "hello\\x", " world"] # looks good
Ripper.tokenize "\"\nhello\\x world"
# => ["\"", "\n world", "hello\\x"] # order is reversed

These invalid escapes also gets wrong

Ripper.tokenize("\"\n\\Cxx\"")   #=> ["\"", "\nx", "\\Cx", "\""]
Ripper.tokenize("\"\n\\Mxx\"")   #=> ["\"", "\nx", "\\Mx", "\""]
Ripper.tokenize("\"\n\\c\\cx\"") #=> ["\"", "\nx", "\\c\\c", "\""]
Ripper.tokenize("\"\n\\ux\"")    #=> ["\"", "\nx", "\""]
Ripper.tokenize("\"\n\\xx\"")    #=> ["\"", "\nx", "\\x", "\""]

And these literals also gets wrong

Ripper.tokenize("<<A\n\n\\xyz") #=> ["<<A", "\n", "\nyz", "\\x"]
Ripper.tokenize("%(\n\\xyz)")   #=> ["%(", "\nyz", "\\x", ")"]
Ripper.tokenize("%Q(\n\\xyz)")  #=> ["%Q(", "\nyz", "\\x", ")"]
Ripper.tokenize(":\"\n\\xyz\"") #=> [":\"", "\nyz", "\\x", "\""]

I encountered this while typing a valid string literal into IRB

irb(main):001> "
irb(main):002> \x█

Other invalid escape sequence that disappears from tokenize result

Ripper.tokenize('"\u{123')
# => ["\""]

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #20578

Tokenizing string literal that have newline and invalid escape is wrong

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#1

Project

General

Profile

Ruby

Custom queries

Bug #20578

Tokenizing string literal that have newline and invalid escape is wrong

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago ActionsCopy link #1

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#1