Bug #7090: UTF-16LE String#<< append 0x0 for certain codepoints - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #7090

closed

UTF-16LE String#<< append 0x0 for certain codepoints

Added by stefan (Stefan Lang) over 12 years ago. Updated over 12 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 1.9.3p194 (2012-04-20) [x86_64-linux]

Backport:

[ruby-core:47751]

Description

IMO, the behaviour with the UTF-8 string is correct.

$ ri193 'String#<<'
= String#<<

(from ruby core)¶

str << integer       -> str
str.concat(integer)  -> str
str << obj           -> str
str.concat(obj)      -> str

Append---Concatenates the given object to str. If the object is a
Integer, it is considered as a codepoint, and is converted to a character
before concatenation.

a = "hello "
a << "world"   #=> "hello world"
a.concat(33)   #=> "hello world!"

AFAIK, a Ruby 1.9 string can be viewed as either 1) a sequence of raw bytes,
or 2) a sequence of codepoints.

Except for maybe regexes, Ruby has no higher level concept of a "character"
than a codepoint. Insofar I don't know what the "and is converted to
a character before concatenation" means.

If we take the sequence of codepoints view, than "str << integer" is simply
appending a codepoint.

If we take the sequence of bytes view, then "str << integer" is converting
the codepoint into a sequence of bytes that correspond to the codepoint
in str.encoding and appending that sequence of bytes.

Actions

Copy link

#1 [ruby-core:47753]

Updated by stefan (Stefan Lang) over 12 years ago

UTF-16BE

irb(main):003:0> s = "".force_encoding('utf-16be')
=> ""
irb(main):004:0> s << 0x20
=> "\u0000"
irb(main):005:0> s << 0x300
=> "\u0000\u0300"

Actions

Copy link

#2 [ruby-core:47754]

Updated by stefan (Stefan Lang) over 12 years ago

With older Ruby version: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

the string correctly contains 0x20, 0x300 for UTF-8, UTF-16LE and UTF-16BE.

Actions

Copy link

Updated by naruse (Yui NARUSE) over 12 years ago

Status changed from Open to Closed
% Done changed from 0 to 100

This issue was solved with changeset r37058.
Stefan, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

string.c (rb_str_concat): use memcpy to copy a string which contains
NUL characters. [ruby-core:47751] [Bug #7090]

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0

Project

General

Profile

Ruby

Custom queries

Bug #7090

UTF-16LE String#<< append 0x0 for certain codepoints

(from ruby core)¶

Updated by stefan (Stefan Lang) over 12 years ago

Updated by stefan (Stefan Lang) over 12 years ago

Updated by naruse (Yui NARUSE) over 12 years ago