Project

General

Profile

Bug #13806

StringIO encoding conversion

Added by lloeki (Loic Nageleisen) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
[ruby-core:82349]

Description

StringIO's doc page says:

Pseudo I/O on String object.

Commonly used to simulate $stdio or $stderr

As it turns out, this is precisely my use case, as I was writing some tests that boiled down to something like a (highly simplified) this:

s = StringIO.new("foo")
stuff.new(s).do_something
assert_equal "foo", s.tap(&:rewind).read

The result of which was in my case:

--- expected
+++ actual
@@ -1,2 +1,2 @@
-"foo"
+# encoding: ASCII-8BIT
+""

Indeed I had a bug so my test was supposed to fail ("foo" vs "") but what caught my eye was the encoding issue.

So I did some comparison tests, and behaviours differ significantly:

f = File.open("foo", File::CREAT | File::RDWR)
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => nil
f.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
f.read.encoding     # reads "" at EOF   # => #<Encoding:UTF-8>
s = StringIO.new("foo")                 # => #<StringIO:0x007f879e9e54d0>
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

There's that subtle little issue at EOF. So, what about "w+"?:

f = File.open("foo", "w+")              # => #<File:foo>
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => nil
f.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
f.read.encoding     # reads "" at EOF   # => #<Encoding:UTF-8>
s = StringIO.new("foo", "w+")           # => #<StringIO:0x007f879e81f268>
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

Somehow it makes StringIO always behave as binary on #read. Hmmm.

Let's try binary. IO's doc says:

"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.

f = File.open("foo", "w+b")             # => #<File:foo>
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => #<Encoding:ASCII-8BIT>
f.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
f.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>
s = StringIO.new("foo", "w+b")          # => #<StringIO:0x007f879f0bd460>
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

Close, but no cigar: external_encoding is still incorrect, and #read could care less. Let's try making things explicit:

f = File.open("foo", "w+b:ASCII-8BIT:ASCII-8BIT")
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => #<Encoding:UTF-8>
f.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
f.read.encoding     # reads "" at EOF   # => #<Encoding:UTF-8>
s = StringIO.new("", "w+b:ASCII-8BIT:ASCII-8BIT")
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

Nope, external_encoding still wrong. Anyway, in my case I was looking for UTF-8, so what about that?

f = File.open("foo", "w+b:UTF-8:UTF-8") # => #<File:foo>
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => #<Encoding:UTF-8>
f.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
f.read.encoding     # reads "" at EOF   # => #<Encoding:UTF-8>
s = StringIO.new("", "w+b:UTF-8:UTF-8") # => #<StringIO:0x007fd531cb9248>
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

StringIO keeps insisting on its binary output irrespective of the mode argument as described in the doc. Last resort, forcing text mode:

f = File.open("foo", "w+t:UTF-8:UTF-8") # => #<File:foo>
f.write("foo")                          # => 3
f.rewind                                # => 0
f.internal_encoding                     # => nil
f.external_encoding                     # => #<Encoding:UTF-8>
f.read.encoding     # reads "foo"       # => #<Encoding:UTF-8>
f.read.encoding     # reads "" at EOF   # => #<Encoding:UTF-8>
s = StringIO.new("", "w+t:UTF-8:UTF-8") # => #<StringIO:0x007f879f04fc08>
s.internal_encoding                     # => nil
s.external_encoding                     # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"       # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF   # => #<Encoding:ASCII-8BIT>

Same. Anyway, one last time, let's go nuts:

f = File.open("foo", "w+:UTF-16:UTF-32")  # => #<File:foo>
f.write("foo")                            # => 3
f.rewind                                  # => 0
f.internal_encoding                       # => #<Encoding:UTF-32 (dummy)>
f.external_encoding                       # => #<Encoding:UTF-16 (dummy)>
f.read.encoding     # reads "foo"         # => #<Encoding:UTF-32 (dummy)>
f.read.encoding     # reads "" at EOF     # => #<Encoding:UTF-32 (dummy)>
s = StringIO.new("", "w+:UTF-16:UTF-32")  # => #<StringIO:0x007f879f04fc08>
s.internal_encoding                       # => nil
s.external_encoding                       # => #<Encoding:UTF-8>
s.read.encoding     # reads "foo"         # => #<Encoding:ASCII-8BIT>
s.read.encoding     # reads "" at EOF     # => #<Encoding:ASCII-8BIT>

I think the result speaks for itself.

In my specific case I quickly found workarounds, but this makes for brittle code ant tests. Sometimes this involves faking StringIO with an actual temp file, which is, let's say, sub par.

Tangentially related: StringIO is missing quite some methods compared to IO, either sometimes forcing code to be aware of it, which is IMHO not good, (e.g breaking code coverage in tests), requiring monkeypatching StringIO, or making creative (ahem) use of temp files and thus hitting the filesystem.

Seems tied to old-ish: https://bugs.ruby-lang.org/issues/7964

Associated revisions

Revision 6ee82564
Added by nobu (Nobuyoshi Nakada) over 2 years ago

stringio.c: encoding at EOF

  • ext/stringio/stringio.c (strio_read): should return string with the external encoding, at EOF too. [ruby-core:82349] [Bug #13806]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 59578
Added by nobu (Nobuyoshi Nakada) over 2 years ago

stringio.c: encoding at EOF

  • ext/stringio/stringio.c (strio_read): should return string with the external encoding, at EOF too. [ruby-core:82349] [Bug #13806]

Revision 59578
Added by nobu (Nobuyoshi Nakada) over 2 years ago

stringio.c: encoding at EOF

  • ext/stringio/stringio.c (strio_read): should return string with the external encoding, at EOF too. [ruby-core:82349] [Bug #13806]

Revision 59578
Added by nobu (Nobuyoshi Nakada) over 2 years ago

stringio.c: encoding at EOF

  • ext/stringio/stringio.c (strio_read): should return string with the external encoding, at EOF too. [ruby-core:82349] [Bug #13806]

History

Updated by naruse (Yui NARUSE) over 2 years ago

s = StringIO.new("foo", "w+") # => #StringIO:0x007f879e81f268
s.internal_encoding # => nil
s.external_encoding # => #Encoding:UTF-8
s.read.encoding # reads "foo" # => #Encoding:ASCII-8BIT
s.read.encoding # reads "" at EOF # => #Encoding:ASCII-8BIT

The 4th line's comment, 'reads "foo"' is wrong; it returns ''.
Therefore though it's not expected encoding, that's not so bad.

Updated by lloeki (Loic Nageleisen) over 2 years ago

naruse (Yui NARUSE) wrote:

The 4th line's comment, 'reads "foo"' is wrong; it returns ''.
Therefore though it's not expected encoding, that's not so bad.

You're right about the value. I guess this explains the encoding ASCII-8BIT encoding, which is somehow consistent with the other EOF reads on a StringIO. Yet such a value from read is significantly inconsistent in behaviour with both the File(..., "w+") case and StringIO.new("foo"). Even s.rewind on StringIO.new("foo", "w+") won't get the proper value from s.read.

#3

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r59578.


stringio.c: encoding at EOF

  • ext/stringio/stringio.c (strio_read): should return string with the external encoding, at EOF too. [ruby-core:82349] [Bug #13806]

Also available in: Atom PDF