Bug #20938
closedPercent String literal delimiter impacts string contents with parse.y
Description
The following code defines 2 programs that declare string literals using %
and a single byte delimiter. I'd expect these programs to have the same output no matter which delimiter is used.
With Prism, both programs output the same value (the string content: 1_\n
), but using parse.y the output values differ depending on the delimiter.
Are the outputs supposed to be the same or different?
program1 = "%\n1_\r\n\n" # => parse.y: 1_, prism: "1_\n"
program2 = "%'1_\r\n'" # => "1_\n"
p eval(program1)
p eval(program2)
Updated by eightbitraptor (Matt V-H) about 1 month ago
- ruby -v set to ruby 3.4.0dev (2024-11-28T09:19:02Z master 31a3e87777) +PRISM +GC [arm64-darwin24]
Updated by eightbitraptor (Matt V-H) about 1 month ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
Prism seems to cut the string content out, then convert EOLs.
The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
- Status changed from Open to Assigned
- Assignee set to prism
Updated by tenderlovemaking (Aaron Patterson) about 1 month ago
nobu (Nobuyoshi Nakada) wrote in #note-3:
Prism seems to cut the string content out, then convert EOLs.
The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".
Sorry, I don't understand. Can you explain more? Why does EOL conversion impact program1
but not program2
?
Updated by tenderlovemaking (Aaron Patterson) about 1 month ago
tenderlovemaking (Aaron Patterson) wrote in #note-5:
nobu (Nobuyoshi Nakada) wrote in #note-3:
Prism seems to cut the string content out, then convert EOLs.
The conversion of EOL is lower layer than parsing, so the result of "program1" should be "1_" without "\n".
Sorry, I don't understand. Can you explain more? Why does EOL conversion impact
program1
but notprogram2
?
I think I understand, but I will try to explain. In program1
EOL conversion first changes it to "%\n1_\n\n"
, so the second \n
becomes the delimiter?
Is that correct?
Thanks
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
tenderlovemaking (Aaron Patterson) wrote in #note-6:
I think I understand, but I will try to explain. In
program1
EOL conversion first changes it to"%\n1_\n\n"
, so the second\n
becomes the delimiter?Is that correct?
Correct!
Updated by eileencodes (Eileen Uchitelle) about 1 month ago
- Status changed from Assigned to Closed
Applied in changeset git|9fe6fd86936ead769fe983feb5461ca4f192f16e.
[ruby/prism] Fix percent delimiter strings with crlfs
parse.y treats CRLF as a LF and basically "normalizes" them before
parsing. That means a string like %\nfoo\r\n
is actually treated as
%\nfoo\n
for the purposes of parsing. This happens on both the
opening side of the percent string as well as on the closing side. So
for example %\r\nfoo\n
must be treated as %\nfoo\n
.
To handle this in Prism, when we start a % string, we check if it starts
with \r\n
, and then consider the terminator to actually be \n
. Then
we check if there are \r\n
as we lex the string and treat those as
\n
, but only in the case the start was a \n
.
Fixes: #3230
[Bug #20938]
https://github.com/ruby/prism/commit/e573ceaad6
Co-authored-by: John Hawthorn jhawthorn@github.com
Co-authored-by: eileencodes eileencodes@gmail.com
Co-authored-by: Kevin Newton kddnewton@gmail.com