Bug #8289
closed[].join.encoding # => US-ASCII (I expect also UTF-8
Description
May be related to http://bugs.ruby-lang.org/issues/5379
$ date
Thu Apr 18 23:56:54 CEST 2013
$ rvm get stable
$ rvm install ruby-head
... long compile process ...
$ rvm use ruby-head
Using /Users/peter_v/.rvm/gems/ruby-head
$ ruby -v
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
$ cat empty_array_join_returns_ASCII_encoding.rb
puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding
Actual result:¶
$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
US-ASCII
UTF-8
Expected result¶
$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
UTF-8  # This is edited for expected result (not the actual result)
UTF-8
I would expect that in Ruby 2.0 with UTF-8 as default encoding,
the returned encoding of an array (with default encoding strings),
is always UTF-8, independent of the size of the array.
The current behaviour breaks my tests for an output encoding of
UTF-8 in the case the array is empty.
My work around is  array.join().encode("utf-8") which works, but is ugly.
        
          
          Updated by naruse (Yui NARUSE) over 12 years ago
          
          
        
        
      
      - Status changed from Open to Rejected
 
It is intended.
Strings always generated as an ASCII only string has US-ASCII encoding.
It shall not cause any meaningful side effects.
        
          
          Updated by khalil_fazal (Khalil Fazal) almost 9 years ago
          
          
        
        
      
      A work around for my own projects:
class Array
  alias_method :old_join, :join
  # A work around for https://bugs.ruby-lang.org/issues/8289
  def join(separator = $,)
    '' + old_join(separator)
  end
end
puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding
Actual result:
UTF-8
UTF-8
UTF-8
as expected.
I'm posting in this old bug report for future readers.
I do not expect this change to be merged into ruby master.
        
          
          Updated by naruse (Yui NARUSE) almost 9 years ago
          
          
        
        
      
      - Backport deleted (
1.9.3: UNKNOWN, 2.0.0: UNKNOWN) 
Khalil Fazal wrote:
puts [].join.encode("utf-8").encoding
puts [].join.force_encoding("utf-8").encoding is correct because of both semantic and performance.
String#force_encoding just overwrite the encoding of string instead of String#encode which is encoding conversion.