Project

General

Profile

Bug #10705

JSON::ParserError#message is wrong encoding (ASCII-8BIT)

Added by josh.cheek (Josh Cheek) almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.0dev (2015-01-06 trunk 49159) [x86_64-darwin13]
[ruby-core:67386]

Description

JSON::ParserError#message is wrong encoding (ASCII-8BIT). I would expect the error to be whatever the internal encoding is (in my case, utf8), perhaps inspecting the string in the error message such that all characters would be valid in that encoding.

Here is an example of where it becomes an issue:

# encoding: utf-8
require 'json'  # => true

json = JSON.dump("√")                                          # => "\"√\""
begin
  result = JSON.parse(json)
  puts "PARSED: #{result.inspect}"
rescue JSON::ParserError => e
  `ruby -v`                                                    # => "ruby 2.3.0dev (2015-01-06 trunk 49159) [x86_64-darwin13]\n"
  json.encoding                                                # => #<Encoding:UTF-8>
  e.message.encoding                                           # => #<Encoding:ASCII-8BIT>
  e.message                                                    # => "757: unexpected token at '\"\xE2\x88\x9A\"'"
  puts "Could not parse #{json.inspect} because #{e.message}"  # ~> Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
end

# ~> Encoding::CompatibilityError
# ~> incompatible character encodings: UTF-8 and ASCII-8BIT
# ~>
# ~> f9.rb:13:in `rescue in <main>'
# ~> f9.rb:5:in `<main>'

If the parsed string doesn't have a multibyte unicode character, it still happens, but fixes itself when it comes in contact with another string, since all its bytes are within the ASCII range.

Documented the actual use case and debugging here.

(side thought: should I open another bug since it generates invalid JSON?)

Associated revisions

Revision b38c0b79
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@50342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 50342
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

Revision 50342
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

Revision 50342
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

Revision 50342
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

Revision 50342
Added by nobu (Nobuyoshi Nakada) over 4 years ago

parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]

Revision 4fe4f465
Added by nagachika (Tomoyuki Chikanaga) over 4 years ago

merge revision(s) 50339,50340,50342,50343: [Backport #10705]

    parser.rl: use StringValue

    * ext/json/parser/parser.rl (cParser_initialize): use StringValue

instead of direct rb_convert_type and remove duplicate
conversion.
* ext/json/parser/parser.rl: raise with messages in UTF-8
encoding. [ruby-core:67386] [Bug #10705]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_2@51571 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 51571
Added by nagachika (Tomoyuki Chikanaga) over 4 years ago

merge revision(s) 50339,50340,50342,50343: [Backport #10705]

parser.rl: use StringValue

* ext/json/parser/parser.rl (cParser_initialize): use StringValue

instead of direct rb_convert_type and remove duplicate
conversion.
* ext/json/parser/parser.rl: raise with messages in UTF-8
encoding. [ruby-core:67386] [Bug #10705]

History

#1

Updated by nobu (Nobuyoshi Nakada) over 4 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Applied in changeset r50342.


parser.rl: rb_enc_raise

  • ext/json/parser/parser.rl: raise with messages in UTF-8 encoding. [ruby-core:67386] [Bug #10705]
#2

Updated by usa (Usaku NAKAMURA) over 4 years ago

  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: WONTFIX, 2.1: REQUIRED, 2.2: REQUIRED
#3

Updated by usa (Usaku NAKAMURA) over 4 years ago

  • Backport changed from 2.0.0: WONTFIX, 2.1: REQUIRED, 2.2: REQUIRED to 2.0.0: WONTFIX, 2.1: WONTFIX, 2.2: REQUIRED

memo:
I could write a patch for ruby_2_1, but could not make .c from .rl .

#4

Updated by nobu (Nobuyoshi Nakada) over 4 years ago

You could make ext/json/parser/parser.c by make srcs-ext, if you were have ragel command.

#5

Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago

  • Backport changed from 2.0.0: WONTFIX, 2.1: WONTFIX, 2.2: REQUIRED to 2.0.0: WONTFIX, 2.1: WONTFIX, 2.2: DONE

r50339, r50340, r50342 and r50343 were backported into ruby_2_2 branch at r51571.

Also available in: Atom PDF