Project

General

Profile

Bug #8289

[].join.encoding # => US-ASCII (I expect also UTF-8

Added by peter_v (Peter Vandenabeele) about 4 years ago. Updated 6 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
Backport:
[ruby-core:54445]

Description

May be related to http://bugs.ruby-lang.org/issues/5379

$ date
Thu Apr 18 23:56:54 CEST 2013

$ rvm get stable

$ rvm install ruby-head
... long compile process ...

$ rvm use ruby-head
Using /Users/peter_v/.rvm/gems/ruby-head

$ ruby -v
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]

$ cat empty_array_join_returns_ASCII_encoding.rb
puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding

Actual result:

$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
US-ASCII
UTF-8

Expected result

$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
UTF-8 # This is edited for expected result (not the actual result)
UTF-8

I would expect that in Ruby 2.0 with UTF-8 as default encoding,
the returned encoding of an array (with default encoding strings),
is always UTF-8, independent of the size of the array.

The current behaviour breaks my tests for an output encoding of
UTF-8 in the case the array is empty.

My work around is array.join().encode("utf-8") which works, but is ugly.

History

#1 [ruby-core:54446] Updated by naruse (Yui NARUSE) about 4 years ago

  • Status changed from Open to Rejected

It is intended.
Strings always generated as an ASCII only string has US-ASCII encoding.
It shall not cause any meaningful side effects.

#2 [ruby-core:78918] Updated by khalil_fazal (Khalil Fazal) 6 months ago

A work around for my own projects:

class Array
  alias_method :old_join, :join

  # A work around for https://bugs.ruby-lang.org/issues/8289
  def join(separator = $,)
    '' + old_join(separator)
  end
end

puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding

Actual result:

UTF-8
UTF-8
UTF-8

as expected.

I'm posting in this old bug report for future readers.
I do not expect this change to be merged into ruby master.

#3 [ruby-core:78930] Updated by naruse (Yui NARUSE) 6 months ago

  • Backport deleted (1.9.3: UNKNOWN, 2.0.0: UNKNOWN)

Khalil Fazal wrote:

puts [].join.encode("utf-8").encoding

puts [].join.force_encoding("utf-8").encoding is correct because of both semantic and performance.
String#force_encoding just overwrite the encoding of string instead of String#encode which is encoding conversion.

Also available in: Atom PDF