Project

General

Profile

Bug #15876

1.to_s.encoding != Encoding.default_internal

Added by grosser (Michael Grosser) 25 days ago. Updated 19 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92842]

Description

I ran into strange looking test output when I compared .to_s with an expected text, saying that the encoding was different, which is confusing/annoying especially to users that don't know how encodings work in ruby.
1.to_s.encoding should be the same as "".encoding

History

Updated by shevegen (Robert A. Heiler) 25 days ago

which is confusing/annoying especially to users that don't know
how encodings work in ruby.

I personally finally switched into UTF-8 (oddly enough, primarily due to emoji and
unicode-symbols that can be used for simple indications both on the commandline and
www), but I think one problem (for me) was from ruby 1.8.x to later ruby versions
that there was not that much documentation available.

Judging from your comment encoding may still pose a problem for some ruby users (or
potentially new ruby users).

Some time ago, I think, jeremy evans wrote a document about symbols, which was added
(my apologies if I misremember). If anyone feels like writing some document about
encoding in ruby, and how to deal with it ... :) (could be in wiki-style or perhaps
gist-github or some other place; I am in no way suggesting that only a single
person should do so, it could be a collaborative effort).

To the issue at hand, I just tested in irb:

1.to_s.encoding  #should be the same as "".encoding # => #<Encoding:US-ASCII>
"".encoding # => #<Encoding:UTF-8>

This is indeed a little surprising (to me). There may be valid reasons for this,
perhaps default external encoding, or something like this, but I can see why
people may be confused about it. Actually what surprises me is that .to_s on
the number leads to US-ASCII encoding by default.

I think looking back when I used an ISO-encoding, the most surprising result I
had encountered was actually in regards to regexp-engine and encodings used
there. I do not remember exactly how I found it, but I think I reported it back
then; still not entirely sure how it came, but regexes may also be an area where
users may be a little bit confused - so documentation may be of some help.

Updated by mame (Yusuke Endoh) 25 days ago

grosser (Michael Grosser), could you elaborate your problem? I cannot reproduce the warning. What warning did you see? And how?

s1 = 1.to_s
p s1.encoding #=> #<Encoding:US-ASCII>

s2 = "1"
p s2.encoding #=> #<Encoding:UTF-8>

p s1 == s2 #=> true with no warning

Updated by duerst (Martin Dürst) 25 days ago

mame (Yusuke Endoh):

What grosser (Michael Grosser) is saying is that

p s1.encoding == s2.encoding #=> false

but he expects the result to be true. But you are right that what counts is the equality of the strings, not the encodings.

Updated by mame (Yusuke Endoh) 25 days ago

grosser (Michael Grosser) said

I ran into strange looking test output when I compared .to_s with an expected text, saying that the encoding was different

I thought that some string-comparison assertions (maybe attributed to an external testing framework?) emitted a spurious warning like "the encoding was different" or something.

Updated by naruse (Yui NARUSE) 19 days ago

  • Status changed from Open to Feedback

What is the problem you are actually troubled with?

If it is just a testing problem, I feel it should just use correct assertions.
But if there's a frequent pitfall, I may reconsider it.

Updated by Hanmac (Hans Mackowiak) 19 days ago

There is Encoding.compatible? which might help to check if two strings/symbols has a common encoding

naruse (Yui NARUSE) i don't know if you are the right contact person for this, but is there a way to see if two encoding objects are compatible or can that only be checked on the string?

Updated by Eregon (Benoit Daloze) 19 days ago

Hanmac (Hans Mackowiak) wrote:

is there a way to see if two encoding objects are compatible or can that only be checked on the string?

Encoding.compatible? can take Encoding arguments too:

> Encoding.compatible?(Encoding::UTF_8, Encoding::US_ASCII) 
=> #<Encoding:UTF-8>

Also available in: Atom PDF