Actions
Bug #13292
closedInvalid encodings in UTF-32
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
Backport:
Description
Ruby is very strict about valid UTF-8 encodings, which is great.
Strings that encode surrogates or too large codepoints are not valid.
However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid:
Example 1 (too large value)
a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}"
a.valid_encoding? # => true
Example 2 (surrogate)
b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800"
b.valid_encoding? #=> true
The behaviour should be changed to String#valid_encoding?
reporting false
For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71)
Actions
Like0
Like0Like0Like0Like0Like0