Project

General

Profile

Actions

Bug #20938

closed

Percent String literal delimiter impacts string contents with parse.y

Added by eightbitraptor (Matt V-H) about 1 month ago. Updated about 1 month ago.

Status:
Closed
Assignee:
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-11-28T09:19:02Z master 31a3e87777) +PRISM +GC [arm64-darwin24]
[ruby-core:120144]

Description

The following code defines 2 programs that declare string literals using % and a single byte delimiter. I'd expect these programs to have the same output no matter which delimiter is used.

With Prism, both programs output the same value (the string content: 1_\n), but using parse.y the output values differ depending on the delimiter.

Are the outputs supposed to be the same or different?

program1 = "%\n1_\r\n\n" # => parse.y: 1_, prism: "1_\n"
program2 = "%'1_\r\n'"   # => "1_\n"

p eval(program1)
p eval(program2)

[Github Issue]

Actions #1

Updated by eightbitraptor (Matt V-H) about 1 month ago

  • ruby -v set to ruby 3.4.0dev (2024-11-28T09:19:02Z master 31a3e87777) +PRISM +GC [arm64-darwin24]
Actions #2

Updated by eightbitraptor (Matt V-H) about 1 month ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

Prism seems to cut the string content out, then convert EOLs.

The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

  • Status changed from Open to Assigned
  • Assignee set to prism

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

nobu (Nobuyoshi Nakada) wrote in #note-3:

Prism seems to cut the string content out, then convert EOLs.

The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".

Sorry, I don't understand. Can you explain more? Why does EOL conversion impact program1 but not program2?

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

tenderlovemaking (Aaron Patterson) wrote in #note-5:

nobu (Nobuyoshi Nakada) wrote in #note-3:

Prism seems to cut the string content out, then convert EOLs.

The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".

Sorry, I don't understand. Can you explain more? Why does EOL conversion impact program1 but not program2?

I think I understand, but I will try to explain. In program1 EOL conversion first changes it to "%\n1_\n\n", so the second \n becomes the delimiter?

Is that correct?

Thanks

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

tenderlovemaking (Aaron Patterson) wrote in #note-6:

I think I understand, but I will try to explain. In program1 EOL conversion first changes it to "%\n1_\n\n", so the second \n becomes the delimiter?

Is that correct?

Correct!

Actions #8

Updated by eileencodes (Eileen Uchitelle) about 1 month ago

  • Status changed from Assigned to Closed

Applied in changeset git|9fe6fd86936ead769fe983feb5461ca4f192f16e.


[ruby/prism] Fix percent delimiter strings with crlfs

parse.y treats CRLF as a LF and basically "normalizes" them before
parsing. That means a string like %\nfoo\r\n is actually treated as
%\nfoo\n for the purposes of parsing. This happens on both the
opening side of the percent string as well as on the closing side. So
for example %\r\nfoo\n must be treated as %\nfoo\n.

To handle this in Prism, when we start a % string, we check if it starts
with \r\n, and then consider the terminator to actually be \n. Then
we check if there are \r\n as we lex the string and treat those as
\n, but only in the case the start was a \n.

Fixes: #3230

[Bug #20938]

https://github.com/ruby/prism/commit/e573ceaad6

Co-authored-by: John Hawthorn
Co-authored-by: eileencodes
Co-authored-by: Kevin Newton

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0