Bug #8342

IO.readlines ignores Encoding.default_internal if Encoding.default_external is ASCII-8BIT

Added by Leo Cassarani over 2 years ago. Updated about 2 years ago.

[ruby-core:54656]
Status:Closed
Priority:Normal
Assignee:Yui NARUSE
ruby -v:1.9.3 Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

Under normal circumstances, IO.readlines will transcode from Encoding.default_external to Encoding.default_internal:

File.open('hi', 'w') { |f| f.puts "hello\n" }
Encoding.default_external = Encoding::US_ASCII
Encoding.default_internal = Encoding::UTF_8
puts IO.readlines('hi').first.encoding
#=> UTF-8

However, when Encoding.default_external is set to ASCII-8BIT, IO.readlines will always use ASCII-8BIT, regardless of what Encoding.default_internal is set to:

File.open('hi', 'w') { |f| f.puts "hello\n" }
Encoding.default_external = Encoding::ASCII_8BIT
Encoding.default_internal = Encoding::UTF_8
puts IO.readlines('hi').first.encoding
#=> ASCII-8BIT

Using IO#gets instead of IO.readlines will produce the same behaviour.

Associated revisions

Revision 40610
Added by Yui NARUSE about 2 years ago

  • io.c (rb_io_ext_int_to_encs): ignore internal encoding if external encoding is ASCII-8BIT. [Bug #8342]

Revision 40610
Added by Yui NARUSE about 2 years ago

  • io.c (rb_io_ext_int_to_encs): ignore internal encoding if external encoding is ASCII-8BIT. [Bug #8342]

History

#1 Updated by Nobuyoshi Nakada over 2 years ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to Yui NARUSE
  • Target version set to 2.1.0

Seems intended behavior to me.

#2 Updated by Yui NARUSE over 2 years ago

  • Status changed from Assigned to Rejected

If external encoding is ASCII-8BIT, the input content is considered as binary.
It is out of text encoding conversion and its encoding kept as ASCII-8BIT even if default_internal is set.

#3 Updated by Leo Cassarani about 2 years ago

Thanks naruse. However, this seems inconsistent with the way encodings are handled for individual IO instances. For example:

io = File.open('hi', :encoding => "ascii-8bit:utf-16")
puts io.gets.encoding

=> UTF-16

This happens even if Encoding.default_external is set to ASCII-8BIT before opening the file.

#4 Updated by Yui NARUSE about 2 years ago

  • Status changed from Rejected to Assigned

leocassarani (Leo Cassarani) wrote:

Thanks naruse. However, this seems inconsistent with the way encodings are handled for individual IO instances. For example:

io = File.open('hi', :encoding => "ascii-8bit:utf-16")
puts io.gets.encoding

=> UTF-16

This happens even if Encoding.default_external is set to ASCII-8BIT before opening the file.

That side sounds buggy

#5 Updated by Yui NARUSE about 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r40610.
Leo, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • io.c (rb_io_ext_int_to_encs): ignore internal encoding if external encoding is ASCII-8BIT. [Bug #8342]

Also available in: Atom PDF