Backport #5287

1.9.3 - Interpolation in a string causes the string's encoding to be set to ASCII-8BIT

Added by Jon Leighton over 3 years ago. Updated over 3 years ago.

[ruby-core:39309]
Status:Closed
Priority:High
Assignee:Yui NARUSE

Description

There appears to be a bug with the encoding of interpolated strings on 1.9.3.

Here is a comparison of versions:

1.9.2

ruby-1.9.2-p290 :001 > a = ""
=> ""
ruby-1.9.2-p290 :002 > a.encoding
=> #Encoding:UTF-8
ruby-1.9.2-p290 :003 > "#{a}".encoding
=> #Encoding:UTF-8

1.9.3-head

ruby-1.9.3-head :004 > a = ""
=> ""
ruby-1.9.3-head :005 > a.encoding
=> #Encoding:UTF-8
ruby-1.9.3-head :006 > "#{a}".encoding
=> #Encoding:ASCII-8BIT

ruby-head

ruby-head :003 > a = ""
=> ""
ruby-head :004 > a.encoding
=> #Encoding:UTF-8
ruby-head :005 > "#{a}".encoding
=> #Encoding:UTF-8


Related issues

Duplicates Ruby trunk - Bug #5126: Unicode character classes interpolated into regex throws exception Closed 08/01/2011

History

#1 Updated by Jon Leighton over 3 years ago

To be clear about the version tested:

$ ruby -v
ruby 1.9.3dev (2011-09-05 revision 33190) [x86_64-linux]

#2 Updated by Nobuyoshi Nakada over 3 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby trunk to Backport193
  • Status changed from Open to Assigned
  • Assignee set to Yui NARUSE
  • Priority changed from Normal to High

Backport r32791.

#3 Updated by Yui NARUSE over 3 years ago

  • Status changed from Assigned to Closed

Backported in r33236.

#4 Updated by Adam Prescott over 3 years ago

On Wed, Sep 7, 2011 at 12:20 AM, Adam Prescott adam@aprescott.com wrote:

Since "#{a}" is actually a new string, doesn't it make sense that its
encoding should be the default internal encoding? I can see "#{a}" being
used with the encoding change actually expected.

I guess "no" is the answer?

What about "foo#{a}bar"? Would that have the same encoding result as
"#{a}", or is the latter just a special case? (Either choice seems
counterintuitive to me.)

#5 Updated by Yui NARUSE over 3 years ago

Adam Prescott wrote:

On Wed, Sep 7, 2011 at 12:20 AM, Adam Prescott adam@aprescott.com wrote:

Since "#{a}" is actually a new string, doesn't it make sense that its
encoding should be the default internal encoding? I can see "#{a}" being
used with the encoding change actually expected.

I guess "no" is the answer?

default_internal doesn't effect on this situation.
"#{a}" is considered as ` s = a.to_s
So "no" is the answer, s's encoding depends a's encoding.

What about "foo#{a}bar"? Would that have the same encoding result as
"#{a}", or is the latter just a special case? (Either choice seems
counterintuitive to me.)

"foo#{a}bar" is considered as ` s = "foo"; s.concat(a.to_s); s.concat("bar").
So the resulted s's encoding depends "foo".

#6 Updated by Adam Prescott over 3 years ago

On Fri, Sep 9, 2011 at 3:07 PM, Yui NARUSE naruse@airemix.jp wrote:

 I guess "no" is the answer?

default_internal doesn't effect on this situation.
"#{a}" is considered as ` s = a.to_s
So "no" is the answer, s's encoding depends a's encoding.

 What about "foo#{a}bar"? Would that have the same encoding result as
 "#{a}", or is the latter just a special case? (Either choice seems
 counterintuitive to me.)

"foo#{a}bar" is considered as ` s = "foo"; s.concat(a.to_s); s.concat("bar").
So the resulted s's encoding depends "foo".

Helpful to know, thanks.

Also available in: Atom PDF