Project

General

Profile

Actions

Bug #13217

closed

JSON.parse() chokes on the UTF-8 character EM SPACE (U+2003, e2 80 83)

Added by 7stud (7 stud) over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
2.4.0, 2.3.0, 2.2.1, 1.9.3-p551
[ruby-core:79552]

Description

Steps to reproduce

No error here:

require 'json'

json = %Q{
  ["a", "b"]
}

obj = JSON.parse(json)

But there is a UTF-8 space character called EM SPACE (U+2003, e2 80 83), which looks like a regular ascii space, and it causes a parse error:

require 'json'

json = %Q{
  ["a",\u2003"b"]
}

obj = JSON.parse(json)

Here's the error

/Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse': 409: unexpected token at ' "b"] (JSON::ParserError)
'
	from /Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse'
	from 1.rb:7:in `<main>'

Expected behavior

Because UTF-8 characters are supposed to be valid in json, I expected the EM SPACE not to cause a parse error. jsonlint.com reports the json with the EM SPACE to be valid, yet ruby can't parse it.

Actual behavior

I get this error:

/Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse': 409: unexpected token at ' "b"] (JSON::ParserError)
'
	from /Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse'
	from 1.rb:7:in `<main>'

Ruby version:

~/ruby_programs$ ruby --version
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin14]

However, I get the same error in all of the following rubies:

ruby-1.9.3-p551 [ x86_64 ]
ruby-2.2.1 [ x86_64 ]
ruby-2.3.0 [ x86_64 ]
ruby-2.4.0 [ x86_64 ]

Updated by duerst (Martin Dürst) over 7 years ago

  • Status changed from Open to Rejected

RFC 7159 defines what's allowed as spaces between data. Please see the 'ws' production at https://tools.ietf.org/html/rfc7159#section-2, which lists only the following four:

  ws = *(
              %x20 /              ; Space
              %x09 /              ; Horizontal tab
              %x0A /              ; Line feed or New line
              %x0D )              ; Carriage return

You can use other (Unicode) spaces within strings, but not between data. That's quite standard for formats such as JSON, XML, HTML,..., nothing surprising there.

Updated by 7stud (7 stud) over 7 years ago

Martin Dürst wrote:

RFC 7159 defines what's allowed as spaces between data. Please see the 'ws' production at https://tools.ietf.org/html/rfc7159#section-2, which lists only the following four:

  ws = *(
              %x20 /              ; Space
              %x09 /              ; Horizontal tab
              %x0A /              ; Line feed or New line
              %x0D )              ; Carriage return

You can use other (Unicode) spaces within strings, but not between data. That's quite standard for formats such as JSON, XML, HTML,..., nothing surprising there.

Okay, thanks for the json lesson!

Actions

Also available in: Atom PDF

Like0
Like0Like0