Project

General

Profile

Feature #13124

Should #puts convert to external encoding?

Added by Eregon (Benoit Daloze) 5 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:79055]

Description

For instance:

puts "?\x00\x42\x30".force_encoding(Encoding::UTF_16LE)
?B0

puts "?\x00\x42\x30".force_encoding(Encoding::UTF_16LE).encode("utf-8")
?あ

The first result is surprising to me. It seems to treat the String as raw bytes and just "assume" they are displayable in the external encoding.

Should #puts re-encode the String to print in Encoding.default_external or the locale encoding?

STDOUT.set_encoding(Encoding.find("locale"))

seems to do what I expect, but should that be the default?

History

#1 [ruby-core:79138] Updated by naruse (Yui NARUSE) 4 months ago

On current Ruby, IO converts given string only if the IO object is set internal_encoding.
Therefore the behavior is spec.

Yes, the spec is not clear.
I continually inspecting the use cases and implementation to re-design IO encodings, but it still needs further inspection...

I partially wrote that at https://bugs.ruby-lang.org/issues/7201#note-5

#2 [ruby-core:79245] Updated by Eregon (Benoit Daloze) 4 months ago

Thank you for the reply and pointer.

What do you think of having STDOUT, STDERR and STDIN internal_encoding be set by default?
It seems reasonable for those to use the locale encoding.
On the other hand, it seems useless to dump a wide-char String as raw bytes,
it can only be misinterpreted on such a stream.
(Or even more confusing like above where the input is barely related to the actual characters)

Maybe it would be worth to make that an experiment and see what is the impact on compatibility?

Also available in: Atom PDF