Project

General

Profile

Actions

Bug #4340

closed

Encoding of result string for String#gsub is not consistent

Added by drbrain (Eric Hodel) about 13 years ago. Updated almost 13 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 1.9.3dev (2011-01-26 trunk 30659) [x86_64-darwin10.6.0]
Backport:
[ruby-core:34959]

Description

=begin
Depending upon where the replacement occurs, the encoding of the result of String#gsub is not consistent.

When the replacement happens at the beginning of the string the encoding of the result is the encoding of the replacement string.

When the replacement happens elsewhere in the string the encoding of the result is the result of the original string.

With String#sub the encoding of the result is the encoding of the original string always.

$ cat t.rb
puts 'using gsub'
hello_world = 'Hello World!'
hello_world.force_encoding Encoding::UTF_8

everybody = 'Everybody'
everybody.force_encoding Encoding::US_ASCII

hello_everybody = hello_world.gsub(/World/, 'Everybody')

p hello_everybody
p hello_everybody.encoding

hi = 'Hi'
hi.force_encoding Encoding::US_ASCII

hi_world = hello_world.gsub(/Hello/, 'Hi')

p hi_world
p hi_world.encoding

puts 'using sub'
hello_world = 'Hello World!'
hello_world.force_encoding Encoding::UTF_8

everybody = 'Everybody'
everybody.force_encoding Encoding::US_ASCII

hello_everybody = hello_world.sub(/World/, 'Everybody')

p hello_everybody
p hello_everybody.encoding

hi = 'Hi'
hi.force_encoding Encoding::US_ASCII

hi_world = hello_world.sub(/Hello/, 'Hi')

p hi_world
p hi_world.encoding

$ ruby19 -v t.rb
ruby 1.9.3dev (2011-01-26 trunk 30659) [x86_64-darwin10.6.0]
using gsub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:US-ASCII
using sub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:UTF-8
=end


Files

string.c.gsub.encoding.patch (339 Bytes) string.c.gsub.encoding.patch Set destination string encoding to source string encoding for String#gsub drbrain (Eric Hodel), 02/02/2011 08:43 AM
Actions #1

Updated by headius (Charles Nutter) about 13 years ago

=begin
Your beginning-of-string substitutions don't use the "hi" variable in either case. It doesn't affect the result, though.

JRuby behaves differently, apparently using the pattern's encoding in gsub and the original's encoding in sub (and our pattern's encoding is wrong due to other issues).

~/projects/jruby ➔ jruby --1.9 t.rb
using gsub
"Hello Everybody!"
#Encoding:ASCII-8BIT
"Hi World!"
#Encoding:ASCII-8BIT
using sub
"Hello Everybody!"
#Encoding:UTF-8
"Hi World!"
#Encoding:UTF-8

Filed: http://jira.codehaus.org/browse/JRUBY-5437
=end

Actions #2

Updated by drbrain (Eric Hodel) about 13 years ago

=begin
The attached patch fixes this problem, may I commit?
=end

Actions #3

Updated by naruse (Yui NARUSE) about 13 years ago

=begin
Yes, you can; please commit it with a test.
=end

Actions #4

Updated by meta (mathew murphy) about 13 years ago

=begin
Can I ask why regexps are not affected by

encoding: UTF-8

declarations?

mathew

=end

Actions #5

Updated by drbrain (Eric Hodel) about 13 years ago

  • Status changed from Open to Closed
  • Assignee set to drbrain (Eric Hodel)

=begin
Fixed by r30806 (with test)
=end

Actions #6

Updated by meta (mathew murphy) about 13 years ago

=begin
On Fri, Feb 4, 2011 at 10:37, mathew wrote:

Can I ask why regexps are not affected by

encoding: UTF-8

declarations?

Nobody?

I still can't think of a reason, so what am I missing?

mathew

=end

Actions #7

Updated by nobu (Nobuyoshi Nakada) about 13 years ago

=begin
Hi,

At Wed, 9 Feb 2011 04:41:14 +0900,
mathew wrote in [ruby-core:35154]:

Can I ask why regexps are not affected by

encoding: UTF-8

declarations?

Nobody?

I still can't think of a reason, so what am I missing?

It does affect.

$ ruby -e '#encoding:utf-8' -e 'p /\u3042/.encoding'
#Encoding:UTF-8
$ ruby -e '#encoding:cp932' -e 'p /\x81\x42/.encoding'
#Encoding:Windows-31J
$ ruby -e '#encoding:euc-jp' -e 'p /\xa1\xa2/.encoding'
#Encoding:EUC-JP

--
Nobu Nakada

=end

Actions #8

Updated by meta (mathew murphy) about 13 years ago

=begin
On Tue, Feb 8, 2011 at 16:27, Eric Hodel wrote:

You're asking this on a thread attached to a bug on redmine that has
nothing to do with regular expressions.  Try making a new bug or thread.

http://redmine.ruby-lang.org/projects/ruby/issues/new reports an error:

"No tracker is associated to this project. Please check the Project settings."

mathew

=end

Actions #9

Updated by sorah (Sorah Fukumori) about 13 years ago

=begin
Hi,

On Thu, Feb 10, 2011 at 12:27 AM, mathew wrote:

http://redmine.ruby-lang.org/projects/ruby/issues/new reports an error:

"No tracker is associated to this project. Please check the Project settings."

Look here:
http://redmine.ruby-lang.org/wiki/ruby/HowtoReport

user can't create new ticket on ruby project.

please create on ruby1.9 or ruby1.8.

Thanks,

--
Shota Fukumori a.k.a. @sora_h - http://codnote.net/

=end

Actions #10

Updated by meta (mathew murphy) about 13 years ago

=begin
On Wed, Feb 9, 2011 at 10:08, Shota Fukumori (sora_h) wrote:

Look here:
http://redmine.ruby-lang.org/wiki/ruby/HowtoReport

Can a link to that be added to the "My Page" template?

mathew

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0