Project

General

Profile

Actions

Bug #13292

closed

Invalid encodings in UTF-32

Added by rbjl (Jan Lelis) about 7 years ago. Updated about 7 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
[ruby-core:79966]

Description

Ruby is very strict about valid UTF-8 encodings, which is great.

Strings that encode surrogates or too large codepoints are not valid.

However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid:

Example 1 (too large value)

a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}"
a.valid_encoding? # => true

Example 2 (surrogate)

b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800"
b.valid_encoding? #=> true

The behaviour should be changed to String#valid_encoding? reporting false

For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71)

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0