Project

General

Profile

Bug #15839

mixed encoding heredoc should be a syntax error regardless the order

Added by nobu (Nobuyoshi Nakada) almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92608]

Description

This heredoc isn't a syntax error,

#encoding: cp932

p <<-STR
\xe9\x9d
\u1234
STR

whereas this is.

#encoding: cp932
"
\xe9\x9d
\u1234
"

Files

mixed-encoding-heredoc-fix.patch (5.28 KB) mixed-encoding-heredoc-fix.patch jeremyevans0 (Jeremy Evans), 05/15/2019 05:06 AM
mixed-encoding-heredoc-reverse-order-fix.patch (1.58 KB) mixed-encoding-heredoc-reverse-order-fix.patch jeremyevans0 (Jeremy Evans), 05/15/2019 03:15 PM
mixed-encoding-heredoc-fix-v2.patch (5.64 KB) mixed-encoding-heredoc-fix-v2.patch jeremyevans0 (Jeremy Evans), 05/15/2019 11:24 PM

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

Heredocs are parsed line-by-line, and mixed encoding is
already detected if it is on the same line:

#encoding: cp932

p <<-STR
\xe9\x9d\u1234
STR
# UTF-8 mixed within Windows-31J source
# \xe9\x9d\u1234
# syntax error, unexpected end-of-input, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END

In order to handle mixed content on separate lines, we need to
keep track of the temporary encoding of the string, which was
previously done via a local variable in tokadd_string. The
attached patch adds a second rb_encoding ** argument to
tokadd_string for keeping track of the temporary encoding,
so that here_document can store the value between lines.

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

Thank you, but it doesn't work for the reverse order, \u followed by \x.

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

nobu (Nobuyoshi Nakada) wrote:

Thank you, but it doesn't work for the reverse order, \u followed by \x.

That is because the \x escape does not do the same type of encoding voodoo that the \u escape does. Not sure if we want to change that, or if we do, how exactly it would work.

Attached is a patch with a less invasive approach that will still raise the syntax error. It should be applied on top of the previous patch. It checks that the string generated by the heredoc has a valid encoding, after the heredoc has been fully parsed.

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

After additional analysis, I found that I only needed to add one line to my initial patch to fix it to work with both \u before \x and \u after \x. With the attached patch (which supersedes the previous patches):

$ ruby -e '#encoding: cp932
p((<<-STR))
\u1234
\xe9\x9d
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\xe9\x9d

$ ruby -e '#encoding: cp932
p((<<-STR))
\xe9\x9d
\u1234
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\u1234
-e:2: syntax error, unexpected end-of-input, expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

Would you commit that patch by yourself?

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

nobu (Nobuyoshi Nakada) wrote:

Would you commit that patch by yourself?

Assuming matz approves a commit bit for me at the next developer meeting, I would be happy to.

#7

Updated by jeremyevans (Jeremy Evans) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|c05eaa93258ddc01e685b6cc3a0da82998a2af48.


Fix mixed encoding in heredoc

Heredocs are parsed line-by-line, so we need to keep track of the
temporary encoding of the string. Previously, a heredoc would
only detect mixed encoding errors if they were on the same line,
this changes things so they will be caught on different lines.

Fixes [Bug #15839]

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: REQUIRED to 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE

ruby_2_6 r67724 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.

Updated by usa (Usaku NAKAMURA) over 1 year ago

  • Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE to 2.4: REQUIRED, 2.5: DONE, 2.6: DONE

ruby_2_5 r67763 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.

Also available in: Atom PDF