Bug #15839
closedmixed encoding heredoc should be a syntax error regardless the order
Added by nobu (Nobuyoshi Nakada) over 6 years ago. Updated about 6 years ago.
Description
This heredoc isn't a syntax error,
#encoding: cp932
p <<-STR
\xe9\x9d
\u1234
STR
whereas this is.
#encoding: cp932
"
\xe9\x9d
\u1234
"
Files
| mixed-encoding-heredoc-fix.patch (5.28 KB) mixed-encoding-heredoc-fix.patch | jeremyevans0 (Jeremy Evans), 05/15/2019 05:06 AM | ||
| mixed-encoding-heredoc-reverse-order-fix.patch (1.58 KB) mixed-encoding-heredoc-reverse-order-fix.patch | jeremyevans0 (Jeremy Evans), 05/15/2019 03:15 PM | ||
| mixed-encoding-heredoc-fix-v2.patch (5.64 KB) mixed-encoding-heredoc-fix-v2.patch | jeremyevans0 (Jeremy Evans), 05/15/2019 11:24 PM |
Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
Actions
#1
[ruby-core:92650]
Heredocs are parsed line-by-line, and mixed encoding is
already detected if it is on the same line:
#encoding: cp932
p <<-STR
\xe9\x9d\u1234
STR
# UTF-8 mixed within Windows-31J source
# \xe9\x9d\u1234
# syntax error, unexpected end-of-input, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END
In order to handle mixed content on separate lines, we need to
keep track of the temporary encoding of the string, which was
previously done via a local variable in tokadd_string. The
attached patch adds a second rb_encoding ** argument to
tokadd_string for keeping track of the temporary encoding,
so that here_document can store the value between lines.
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
Actions
#2
[ruby-core:92656]
Thank you, but it doesn't work for the reverse order, \u followed by \x.
Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
Actions
#3
[ruby-core:92664]
- File mixed-encoding-heredoc-reverse-order-fix.patch mixed-encoding-heredoc-reverse-order-fix.patch added
nobu (Nobuyoshi Nakada) wrote:
Thank you, but it doesn't work for the reverse order,
\ufollowed by\x.
That is because the \x escape does not do the same type of encoding voodoo that the \u escape does. Not sure if we want to change that, or if we do, how exactly it would work.
Attached is a patch with a less invasive approach that will still raise the syntax error. It should be applied on top of the previous patch. It checks that the string generated by the heredoc has a valid encoding, after the heredoc has been fully parsed.
Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
Actions
#4
[ruby-core:92673]
After additional analysis, I found that I only needed to add one line to my initial patch to fix it to work with both \u before \x and \u after \x. With the attached patch (which supersedes the previous patches):
$ ruby -e '#encoding: cp932
p((<<-STR))
\u1234
\xe9\x9d
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\xe9\x9d
$ ruby -e '#encoding: cp932
p((<<-STR))
\xe9\x9d
\u1234
STR
'
-e:4: UTF-8 mixed within Windows-31J source
\u1234
-e:2: syntax error, unexpected end-of-input, expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
Actions
#5
[ruby-core:92727]
Would you commit that patch by yourself?
Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
Actions
#6
[ruby-core:92728]
nobu (Nobuyoshi Nakada) wrote:
Would you commit that patch by yourself?
Assuming matz approves a commit bit for me at the next developer meeting, I would be happy to.
Updated by jeremyevans (Jeremy Evans) over 6 years ago
Actions
#7
- Status changed from Open to Closed
Applied in changeset git|c05eaa93258ddc01e685b6cc3a0da82998a2af48.
Fix mixed encoding in heredoc
Heredocs are parsed line-by-line, so we need to keep track of the
temporary encoding of the string. Previously, a heredoc would
only detect mixed encoding errors if they were on the same line,
this changes things so they will be caught on different lines.
Fixes [Bug #15839]
Updated by nagachika (Tomoyuki Chikanaga) about 6 years ago
Actions
#8
[ruby-core:94096]
- Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: REQUIRED to 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE
ruby_2_6 r67724 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.
Updated by usa (Usaku NAKAMURA) about 6 years ago
Actions
#9
[ruby-core:94572]
- Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE to 2.4: REQUIRED, 2.5: DONE, 2.6: DONE
ruby_2_5 r67763 merged revision(s) 6375c68f8851e1e0fee8a95afba91c4555097127,c05eaa93258ddc01e685b6cc3a0da82998a2af48.