Bug #21528: SyntaxError#message may have broken encoding with multibyte source under Prism - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21528

closed

SyntaxError#message may have broken encoding with multibyte source under Prism

Bug #21528: SyntaxError#message may have broken encoding with multibyte source under Prism

Added by alpaca-tc (Hiroyuki Ishii) 5 months ago. Updated 4 months ago.

Status:

Closed

Assignee:

prism

Target version:

ruby -v:

Backport:

3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:122899]

Description

Since the introduction of Prism, when parsing Ruby source code that contains multibyte characters, SyntaxError#message can sometimes have invalid encoding.
Here is a reproducible example:

begin
RubyVM::InstructionSequence.compile(<<~CODE, nil, nil, 1)
          if a
          # 0000000000000ああああああ
          #
CODE
rescue SyntaxError => e
$e = e
  puts e.message # string contains a multibyte character that is cut off mid-byte. \xE3
  # <compiled>:3: syntax errors found
  #   1 | if a
  # > 2 | # 0000000000000あああああ\xE3 ...
  #     | ^ expected an `end` to close the conditional clause
  # > 3 | #
  #     |  ^ unexpected end-of-input, assuming it is closing the parent top level context

  puts e.message.valid_encoding? #=> expected true, but got false
end

This appears to be caused by a truncation process in prism's error message generating that does not consider multibyte character boundaries.
See: The truncation logic around prism_compile.c L10696-L10709
I'm not sure how to correctly fix it due to lack of knowledge about safe byte truncation.

I discovered this issue through irb, which attempts to display source code even when it contains syntax errors. Because irb uses SyntaxError#message, it raised an ArgumentError: invalid byte sequence in UTF-8. See: https://github.com/ruby/irb/blob/f60dfa8549f746f69e9a6d160604a7a4974ffac1/lib/irb/ruby-lex.rb#L255-L256

If this is considered an irb issue, I already have a patch for IRB that handles it.

Updated by byroot (Jean Boussier) 5 months ago Actions
Copy link
#1 [ruby-core:122907]

I'm not sure how to correctly fix it due to lack of knowledge about safe byte truncation.

If that helps, I did something similar in ruby/json: https://github.com/ruby/json/blob/3090a63a956c30e6d30d93fc9667deccd5e31327/ext/json/ext/parser/parser.c#L456-L462 / https://github.com/ruby/json/commit/e144793b7226c2df75c414749d6f87ab7fcf4dce

It's not perfect as it doesn't consider grapheme clusters, but at least it ensures the included snippet is valid UTF-8.

Updated by Earlopain (Earlopain _) 5 months ago Actions
Copy link
#2 [ruby-core:122908]

Something like this perhaps https://github.com/ruby/ruby/pull/14094. Also doesn't consider grapheme clusters.

I believe truncation from the left is irrelevant here, since the method is only supposed to be called with valid utf8 (guarded at the two relevant method calls of pm_parse_errors_format).

Updated by ko1 (Koichi Sasada) 5 months ago Actions
Copy link
#3 [ruby-core:123000]

Assignee set to prism

Updated by kddnewton (Kevin Newton) 4 months ago Actions
Copy link
#4 [ruby-core:123239]

Status changed from Open to Closed

Fixed by https://github.com/ruby/ruby/commit/d781d69a06e7d4eef3334e44a25b02d05bad1e2d

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #21528

SyntaxError#message may have broken encoding with multibyte source under Prism

Updated by byroot (Jean Boussier) 5 months ago Actions
Copy link
#1 [ruby-core:122907]

Updated by Earlopain (Earlopain _) 5 months ago Actions
Copy link
#2 [ruby-core:122908]

Updated by ko1 (Koichi Sasada) 5 months ago Actions
Copy link
#3 [ruby-core:123000]

Updated by kddnewton (Kevin Newton) 4 months ago Actions
Copy link
#4 [ruby-core:123239]

Project

General

Profile

Ruby

Custom queries

Bug #21528

SyntaxError#message may have broken encoding with multibyte source under Prism

Updated by byroot (Jean Boussier) 5 months ago ActionsCopy link #1 [ruby-core:122907]

Updated by Earlopain (Earlopain _) 5 months ago ActionsCopy link #2 [ruby-core:122908]

Updated by ko1 (Koichi Sasada) 5 months ago ActionsCopy link #3 [ruby-core:123000]

Updated by kddnewton (Kevin Newton) 4 months ago ActionsCopy link #4 [ruby-core:123239]

Updated by byroot (Jean Boussier) 5 months ago Actions
Copy link
#1 [ruby-core:122907]

Updated by Earlopain (Earlopain _) 5 months ago Actions
Copy link
#2 [ruby-core:122908]

Updated by ko1 (Koichi Sasada) 5 months ago Actions
Copy link
#3 [ruby-core:123000]

Updated by kddnewton (Kevin Newton) 4 months ago Actions
Copy link
#4 [ruby-core:123239]