Bug #15839: mixed encoding heredoc should be a syntax error regardless the order - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #15839

closed

mixed encoding heredoc should be a syntax error regardless the order

Bug #15839: mixed encoding heredoc should be a syntax error regardless the order

Added by nobu (Nobuyoshi Nakada) over 6 years ago. Updated over 6 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

Backport:

2.4: REQUIRED, 2.5: DONE, 2.6: DONE

[ruby-core:92608]

Description

This heredoc isn't a syntax error,

#encoding: cp932

p <<-STR
\xe9\x9d
\u1234
STR

whereas this is.

#encoding: cp932
"
\xe9\x9d
\u1234
"

Files

Download all files

mixed-encoding-heredoc-fix.patch (5.28 KB) mixed-encoding-heredoc-fix.patch		jeremyevans0 (Jeremy Evans), 05/15/2019 05:06 AM
mixed-encoding-heredoc-reverse-order-fix.patch (1.58 KB) mixed-encoding-heredoc-reverse-order-fix.patch		jeremyevans0 (Jeremy Evans), 05/15/2019 03:15 PM
mixed-encoding-heredoc-fix-v2.patch (5.64 KB) mixed-encoding-heredoc-fix-v2.patch		jeremyevans0 (Jeremy Evans), 05/15/2019 11:24 PM

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#1 [ruby-core:92650]

File mixed-encoding-heredoc-fix.patch mixed-encoding-heredoc-fix.patch added

Heredocs are parsed line-by-line, and mixed encoding is
already detected if it is on the same line:

#encoding: cp932

p <<-STR
\xe9\x9d\u1234
STR
# UTF-8 mixed within Windows-31J source
# \xe9\x9d\u1234
# syntax error, unexpected end-of-input, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END

In order to handle mixed content on separate lines, we need to
keep track of the temporary encoding of the string, which was
previously done via a local variable in tokadd_string. The
attached patch adds a second rb_encoding ** argument to
tokadd_string for keeping track of the temporary encoding,
so that here_document can store the value between lines.

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#2 [ruby-core:92656]

Thank you, but it doesn't work for the reverse order, \u followed by \x.

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#3 [ruby-core:92664]

File mixed-encoding-heredoc-reverse-order-fix.patch mixed-encoding-heredoc-reverse-order-fix.patch added

nobu (Nobuyoshi Nakada) wrote:

Thank you, but it doesn't work for the reverse order, \u followed by \x.

That is because the \x escape does not do the same type of encoding voodoo that the \u escape does. Not sure if we want to change that, or if we do, how exactly it would work.

Attached is a patch with a less invasive approach that will still raise the syntax error. It should be applied on top of the previous patch. It checks that the string generated by the heredoc has a valid encoding, after the heredoc has been fully parsed.

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#4 [ruby-core:92673]

File mixed-encoding-heredoc-fix-v2.patch mixed-encoding-heredoc-fix-v2.patch added

After additional analysis, I found that I only needed to add one line to my initial patch to fix it to work with both \u before \x and \u after \x. With the attached patch (which supersedes the previous patches):

$ ruby -e '#encoding: cp932
p((<<-STR))
\u1234
\xe9\x9d
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\xe9\x9d

$ ruby -e '#encoding: cp932
p((<<-STR))
\xe9\x9d
\u1234
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\u1234
-e:2: syntax error, unexpected end-of-input, expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#5 [ruby-core:92727]

Would you commit that patch by yourself?

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#6 [ruby-core:92728]

nobu (Nobuyoshi Nakada) wrote:

Would you commit that patch by yourself?

Assuming matz approves a commit bit for me at the next developer meeting, I would be happy to.

Updated by jeremyevans (Jeremy Evans) over 6 years ago Actions
Copy link
#7

Status changed from Open to Closed

Applied in changeset git|c05eaa93258ddc01e685b6cc3a0da82998a2af48.

Fix mixed encoding in heredoc

Heredocs are parsed line-by-line, so we need to keep track of the
temporary encoding of the string. Previously, a heredoc would
only detect mixed encoding errors if they were on the same line,
this changes things so they will be caught on different lines.

Fixes [Bug #15839]

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago Actions
Copy link
#8 [ruby-core:94096]

Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: REQUIRED to 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE

ruby_2_6 r67724 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.

Updated by usa (Usaku NAKAMURA) over 6 years ago Actions
Copy link
#9 [ruby-core:94572]

Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE to 2.4: REQUIRED, 2.5: DONE, 2.6: DONE

ruby_2_5 r67763 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #15839

mixed encoding heredoc should be a syntax error regardless the order

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#1 [ruby-core:92650]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#2 [ruby-core:92656]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#3 [ruby-core:92664]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#4 [ruby-core:92673]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#5 [ruby-core:92727]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#6 [ruby-core:92728]

Updated by jeremyevans (Jeremy Evans) over 6 years ago Actions
Copy link
#7

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago Actions
Copy link
#8 [ruby-core:94096]

Updated by usa (Usaku NAKAMURA) over 6 years ago Actions
Copy link
#9 [ruby-core:94572]

Project

General

Profile

Ruby

Custom queries

Bug #15839

mixed encoding heredoc should be a syntax error regardless the order

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago ActionsCopy link #1 [ruby-core:92650]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago ActionsCopy link #2 [ruby-core:92656]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago ActionsCopy link #3 [ruby-core:92664]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago ActionsCopy link #4 [ruby-core:92673]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago ActionsCopy link #5 [ruby-core:92727]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago ActionsCopy link #6 [ruby-core:92728]

Updated by jeremyevans (Jeremy Evans) over 6 years ago ActionsCopy link #7

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago ActionsCopy link #8 [ruby-core:94096]

Updated by usa (Usaku NAKAMURA) over 6 years ago ActionsCopy link #9 [ruby-core:94572]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#1 [ruby-core:92650]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#2 [ruby-core:92656]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#3 [ruby-core:92664]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#4 [ruby-core:92673]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago Actions
Copy link
#5 [ruby-core:92727]

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions
Copy link
#6 [ruby-core:92728]

Updated by jeremyevans (Jeremy Evans) over 6 years ago Actions
Copy link
#7

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago Actions
Copy link
#8 [ruby-core:94096]

Updated by usa (Usaku NAKAMURA) over 6 years ago Actions
Copy link
#9 [ruby-core:94572]