Bug #13217
closedJSON.parse() chokes on the UTF-8 character EM SPACE (U+2003, e2 80 83)
Description
Steps to reproduce¶
No error here:
require 'json'
json = %Q{
["a", "b"]
}
obj = JSON.parse(json)
But there is a UTF-8 space character called EM SPACE
(U+2003, e2 80 83), which looks like a regular ascii space, and it causes a parse error:
require 'json'
json = %Q{
["a",\u2003"b"]
}
obj = JSON.parse(json)
Here's the error
/Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse': 409: unexpected token at ' "b"] (JSON::ParserError)
'
from /Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse'
from 1.rb:7:in `<main>'
Expected behavior¶
Because UTF-8 characters are supposed to be valid in json, I expected the EM SPACE not to cause a parse error. jsonlint.com
reports the json with the EM SPACE
to be valid, yet ruby can't parse it.
Actual behavior¶
I get this error:
/Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse': 409: unexpected token at ' "b"] (JSON::ParserError)
'
from /Users/7stud/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/json/common.rb:156:in `parse'
from 1.rb:7:in `<main>'
Ruby version:
~/ruby_programs$ ruby --version
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin14]
However, I get the same error in all of the following rubies:
ruby-1.9.3-p551 [ x86_64 ]
ruby-2.2.1 [ x86_64 ]
ruby-2.3.0 [ x86_64 ]
ruby-2.4.0 [ x86_64 ]
Updated by duerst (Martin Dürst) over 7 years ago
- Status changed from Open to Rejected
RFC 7159 defines what's allowed as spaces between data. Please see the 'ws' production at https://tools.ietf.org/html/rfc7159#section-2, which lists only the following four:
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ) ; Carriage return
You can use other (Unicode) spaces within strings, but not between data. That's quite standard for formats such as JSON, XML, HTML,..., nothing surprising there.
Updated by 7stud (7 stud) over 7 years ago
Martin Dürst wrote:
RFC 7159 defines what's allowed as spaces between data. Please see the 'ws' production at https://tools.ietf.org/html/rfc7159#section-2, which lists only the following four:
ws = *( %x20 / ; Space %x09 / ; Horizontal tab %x0A / ; Line feed or New line %x0D ) ; Carriage return
You can use other (Unicode) spaces within strings, but not between data. That's quite standard for formats such as JSON, XML, HTML,..., nothing surprising there.
Okay, thanks for the json lesson!