Project

General

Profile

Actions

Bug #2636

closed

Incorrect UTF-16 string length

Added by scritch (Vincent Isambart) about 14 years ago. Updated almost 13 years ago.

Status:
Closed
Assignee:
-
Target version:
ruby -v:
ruby 1.9.2dev (2010-01-22 trunk 26370) [x86_64-darwin10.2.0]
Backport:
[ruby-core:27748]

Description

=begin
str = "\xDC\x0B\xD8\x40".force_encoding(Encoding::UTF_16BE)
str.length #=> 3

This string is made by inverting 2 words of a UTF-16 character not in the BMP.
The length should be 2 because it's made of two (unpaired) surrogates and not 3.

The most strange part is that even though the length concurs with how the string is displayed when doing #inspect ("\xDC\u0BD8\x40"), but not with what #[] does. If the length is 3, then why does str[2] return nil?
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0