Bug #21102
closedUnexpected encoding when concatenating ASCII string with ASCII compatible string with non ASCII encoding
Description
The problem was noticed in code that is boiled down to:
# encoding: UTF-8
str = "something"
p str.encoding # => #<Encoding:UTF-8>
p [nil, str].join.encoding # => #<Encoding:US-ASCII>
As nil.to_s
is an empty string with encoding ASCII
and "something"
is ASCII
compatible string, the result is a string with ASCII
encoding.
Event simpler example is p (nil.to_s + "something").encoding
. Confusing is that resulting encoding depends on order and on compatibility of string encodings:
# encoding: UTF-8
str1 = "something" # ASCII compatible
str2 = "söméthíng" # not ASCII compatible
p (nil.to_s + str1).encoding # => #<Encoding:US-ASCII>
p (nil.to_s + str2).encoding # => #<Encoding:UTF-8>
p (str1 + nil.to_s).encoding # => #<Encoding:UTF-8>
p (str2 + nil.to_s).encoding # => #<Encoding:UTF-8>
I would expect it to behave akin to summing integers and floats or rationals:
p 1 + 1.0 # => 2.0
p 1.0 + 1 # => 2.0
p 1 + 1r # => (2/1)
p 1r + 1 # => (2/1)
So it is at least surprising to me.
#18579 is probably the most related, but also #14975 and #20594
Updated by naruse (Yui NARUSE) 10 days ago
- Status changed from Open to Rejected
This behavior is for the case when a string is used as as buffer.
In that case the first string is the buffer, and following strings are inputs. Therefore the encoding of the buffer should be respected as far as it can.
If you have a real problem which prevents development, please feedback again.