Project

General

Profile

Bug #4340

Encoding of result string for String#gsub is not consistent

Added by drbrain (Eric Hodel) over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 1.9.3dev (2011-01-26 trunk 30659) [x86_64-darwin10.6.0]
Backport:
[ruby-core:34959]

Description

=begin
Depending upon where the replacement occurs, the encoding of the result of String#gsub is not consistent.

When the replacement happens at the beginning of the string the encoding of the result is the encoding of the replacement string.

When the replacement happens elsewhere in the string the encoding of the result is the result of the original string.

With String#sub the encoding of the result is the encoding of the original string always.

$ cat t.rb
puts 'using gsub'
hello_world = 'Hello World!'
hello_world.force_encoding Encoding::UTF_8

everybody = 'Everybody'
everybody.force_encoding Encoding::US_ASCII

hello_everybody = hello_world.gsub(/World/, 'Everybody')

p hello_everybody
p hello_everybody.encoding

hi = 'Hi'
hi.force_encoding Encoding::US_ASCII

hi_world = hello_world.gsub(/Hello/, 'Hi')

p hi_world
p hi_world.encoding

puts 'using sub'
hello_world = 'Hello World!'
hello_world.force_encoding Encoding::UTF_8

everybody = 'Everybody'
everybody.force_encoding Encoding::US_ASCII

hello_everybody = hello_world.sub(/World/, 'Everybody')

p hello_everybody
p hello_everybody.encoding

hi = 'Hi'
hi.force_encoding Encoding::US_ASCII

hi_world = hello_world.sub(/Hello/, 'Hi')

p hi_world
p hi_world.encoding

$ ruby19 -v t.rb
ruby 1.9.3dev (2011-01-26 trunk 30659) [x86_64-darwin10.6.0]
using gsub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:US-ASCII
using sub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:UTF-8
=end


Files

string.c.gsub.encoding.patch (339 Bytes) string.c.gsub.encoding.patch Set destination string encoding to source string encoding for String#gsub drbrain (Eric Hodel), 02/02/2011 08:43 AM

Associated revisions

Revision edaf78df
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30806 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

Revision 30806
Added by drbrain (Eric Hodel) over 8 years ago

Ensure result encoding is the same as input encoding for String#gsub. [Bug #4340].

History

#1

Updated by headius (Charles Nutter) over 8 years ago

=begin
Your beginning-of-string substitutions don't use the "hi" variable in either case. It doesn't affect the result, though.

JRuby behaves differently, apparently using the pattern's encoding in gsub and the original's encoding in sub (and our pattern's encoding is wrong due to other issues).

~/projects/jruby ➔ jruby --1.9 t.rb
using gsub
"Hello Everybody!"
#Encoding:ASCII-8BIT
"Hi World!"
#Encoding:ASCII-8BIT
using sub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:UTF-8

Filed: http://jira.codehaus.org/browse/JRUBY-5437
=end

#2

Updated by drbrain (Eric Hodel) over 8 years ago

=begin
The attached patch fixes this problem, may I commit?
=end

#3

Updated by naruse (Yui NARUSE) over 8 years ago

=begin
Yes, you can; please commit it with a test.
=end

#4

Updated by meta (mathew murphy) over 8 years ago

=begin
Can I ask why regexps are not affected by
# encoding: UTF-8
declarations?

mathew

=end

#5

Updated by drbrain (Eric Hodel) over 8 years ago

  • Status changed from Open to Closed
  • Assignee set to drbrain (Eric Hodel)

=begin
Fixed by r30806 (with test)
=end

#6

Updated by meta (mathew murphy) over 8 years ago

=begin
On Fri, Feb 4, 2011 at 10:37, mathew meta@pobox.com wrote:

Can I ask why regexps are not affected by

encoding: UTF-8

declarations?

Nobody?

I still can't think of a reason, so what am I missing?

mathew

=end

#7

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

=begin
Hi,

At Wed, 9 Feb 2011 04:41:14 +0900,
mathew wrote in [ruby-core:35154]:

Can I ask why regexps are not affected by

encoding: UTF-8

declarations?

Nobody?

I still can't think of a reason, so what am I missing?

It does affect.

$ ruby -e '#encoding:utf-8' -e 'p /\u3042/.encoding'
#Encoding:UTF-8
$ ruby -e '#encoding:cp932' -e 'p /\x81\x42/.encoding'
#Encoding:Windows-31J
$ ruby -e '#encoding:euc-jp' -e 'p /\xa1\xa2/.encoding'
#Encoding:EUC-JP

--
Nobu Nakada

=end

#8

Updated by meta (mathew murphy) over 8 years ago

=begin
On Tue, Feb 8, 2011 at 16:27, Eric Hodel drbrain@segment7.net wrote:

You're asking this on a thread attached to a bug on redmine that has
nothing to do with regular expressions.  Try making a new bug or thread.

http://redmine.ruby-lang.org/projects/ruby/issues/new reports an error:

"No tracker is associated to this project. Please check the Project settings."

mathew

=end

#9

Updated by sorah (Sorah Fukumori) over 8 years ago

=begin
Hi,

On Thu, Feb 10, 2011 at 12:27 AM, mathew meta@pobox.com wrote:

http://redmine.ruby-lang.org/projects/ruby/issues/new reports an error:

"No tracker is associated to this project. Please check the Project settings."

Look here:
http://redmine.ruby-lang.org/wiki/ruby/HowtoReport

user can't create new ticket on ruby project.

please create on ruby1.9 or ruby1.8.

Thanks,

--
Shota Fukumori a.k.a. @sora_h - http://codnote.net/

=end

#10

Updated by meta (mathew murphy) over 8 years ago

=begin
On Wed, Feb 9, 2011 at 10:08, Shota Fukumori (sora_h) sorah@tubusu.net wrote:

Look here:
http://redmine.ruby-lang.org/wiki/ruby/HowtoReport

Can a link to that be added to the "My Page" template?

mathew

=end

Also available in: Atom PDF