Project

General

Profile

Actions

Bug #1251

closed

gsub problem

Added by pettel (Alexander Pettelkau) over 15 years ago. Updated over 13 years ago.

Status:
Rejected
Assignee:
-
Target version:
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]
Backport:
[ruby-core:22715]

Description

=begin
I wanted to replace "" with "\" in the string "\TEST":

s="\TEST"
puts s # Output --> "\TEST"
s.gsub!("\","\\")
puts s # Output --> "\TEST"
# but EXPECTED Output "\TEST"
=end


Files

tr.rb (46 Bytes) tr.rb gsub doesn't replace as it should pettel (Alexander Pettelkau), 03/07/2009 06:09 PM
Actions #1

Updated by matz (Yukihiro Matsumoto) over 15 years ago

=begin
HI,

In message "Re: [ruby-core:22715] [Bug #1251] gsub problem"
on Sat, 7 Mar 2009 18:08:11 +0900, Alexander Pettelkau writes:

|I wanted to replace "" with "\" in the string "\TEST":
|
|s="\TEST"
|puts s # Output --> "\TEST"
|s.gsub!("\","\\")
|puts s # Output --> "\TEST"
| # but EXPECTED Output "\TEST"

You specified four backslashes in double quotes, which is two
backslashes in a string. But replacement character does backslash
escapement such as \1, and \ (two backslashes) are transformed into
one backslash. That means you've substituted one backslash to one
backslash.

To substitute one backslash into two, you have to do

s.gsub!("\","\\\")

or

s.gsub!(/\/){"\\"}

						matz.

=end

Actions #2

Updated by WoNaDo (Wolfgang Nádasi-Donner) over 15 years ago

=begin
Alexander Pettelkau schrieb:

Bug #1251: gsub problem
http://redmine.ruby-lang.org/issues/show/1251

Author: Alexander Pettelkau
Status: Open, Priority: Normal
Category: core, Target version: 1.9.1
ruby -v: ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]

I wanted to replace "" with "\" in the string "\TEST":

s="\TEST"
puts s # Output --> "\TEST"
s.gsub!("\","\\")
puts s # Output --> "\TEST"
# but EXPECTED Output "\TEST"


http://redmine.ruby-lang.org

After the first step, the String contains two backslashes. This string
will be interpreted again, because there can be references to matched
groups inside (e.g. '\1'). This second interpretation sees a escaped
backslash (backslash-backslash, which results in one backslash.

I think it should be documented,

Wolfgang Nádasi-Donner

=end

Actions #3

Updated by matz (Yukihiro Matsumoto) over 15 years ago

  • Status changed from Open to Rejected

=begin

=end

Actions #4

Updated by WoNaDo (Wolfgang Nádasi-Donner) over 15 years ago

=begin
Yukihiro Matsumoto schrieb:

To substitute one backslash into two, you have to do

s.gsub!("\","\\\")
...
myprompt> irb191-p0
irb(main):001:0> puts "a\b".gsub!("\","\\\")
a\b
=> nil
irb(main):002:0> puts "a\b".gsub!("\","\\\\")
a\b
=> nil

I was surprized by this result long ago, until I started to assume, that
the second replacement works only for <...>, \nr, \, and leaves the
backslash as it is in all other combinations (even at end of the string).

This ist different from the first replacement, which consumes always a
backslash as escape character...

myprompt> irb191-p0
irb(main):001:0> puts "\\w"
\w
=> nil

I think this behaviour should be documented somewhere, because it can
really confuse persons, which do not use complex RegExes during their
daily work.

Wolfgang Nádasi-Donner

=end

Actions #5

Updated by pettel (Alexander Pettelkau) over 15 years ago

=begin
Thanks a lot for clearing that up so fast !

Alexander Pettelkau
=end

Actions #6

Updated by matz (Yukihiro Matsumoto) over 15 years ago

=begin
Hi,

In message "Re: [ruby-core:22719] Re: [Bug #1251] gsub problem"
on Sat, 7 Mar 2009 21:00:34 +0900, Wolfgang Nádasi-Donner writes:

|I think this behaviour should be documented somewhere, because it can
|really confuse persons, which do not use complex RegExes during their
|daily work.

Agreed. Any opinion for concrete description? Anyone?

						matz.

=end

Actions #7

Updated by WoNaDo (Wolfgang Nádasi-Donner) over 15 years ago

=begin
Yukihiro Matsumoto schrieb:

In message "Re: [ruby-core:22719] Re: [Bug #1251] gsub problem"
on Sat, 7 Mar 2009 21:00:34 +0900, Wolfgang Nádasi-Donner writes:
|I think this behaviour should be documented somewhere, because it can
|really confuse persons, which do not use complex RegExes during their
|daily work.
Agreed. Any opinion for concrete description? Anyone?
The contents should describe the fact, that the second parsing of the
replacement string will replace \ by , \n by the string found by
anonymous group n or by empty string if the group doesn't exist and n is
between 1 and 9, or <name> and 'name' by the named group.

But don't use my english. It may lead to more confusion.

Wolfgang Nádasi-Donner

=end

Actions #8

Updated by stepheneb (Stephen Bannasch) over 15 years ago

=begin
This sequence helped me understand the issue better:

a = b = "1_2_3"
=> "1_2_3"
for i in 0..b.length do print "#{b[i]} " end
49 95 50 95 51 => 0..5
b = a.gsub('', '\')
=> "1\2\3"
for i in 0..b.length do print "#{b[i]} " end
49 92 50 92 51 => 0..5
b = a.gsub('
', '\\')
=> "1\2\3"
for i in 0..b.length do print "#{b[i]} " end
49 92 50 92 51 => 0..5
b = a.gsub('_', '\\\')
=> "1\\2\\3"
for i in 0..b.length do print "#{b[i]} " end
49 92 92 50 92 92 51 => 0..7

=end

Actions #9

Updated by rue (Eero Saynatkari) over 15 years ago

=begin
Excerpts from Yukihiro Matsumoto's message of Fri Mar 13 12:47:48 +0200 2009:

Hi,

In message "Re: [ruby-core:22719] Re: [Bug #1251] gsub problem"
on Sat, 7 Mar 2009 21:00:34 +0900, Wolfgang Ndasi-Donner writes:

|I think this behaviour should be documented somewhere, because it can
|really confuse persons, which do not use complex RegExes during their
|daily work.

Agreed. Any opinion for concrete description? Anyone?

RubySpec has this to say (please add any clarifications and
missing behaviour--I am sure there are some 1.9.1 cases at
least):

ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9]

String#sub with pattern, replacement

  • returns a copy of self with all occurrences of pattern replaced with replacement
  • ignores a block if supplied
  • supports \G which matches at the beginning of the string
  • supports /i for ignoring case
  • doesn't interpret regexp metacharacters if pattern is a string
  • replaces \1 sequences with the regexp's corresponding capture
  • treats \1 sequences without corresponding captures as empty strings
  • replaces & and \0 with the complete match
  • replaces ` with everything before the current match
  • replaces ' with everything after the current match
  • replaces \+ with \+
  • replaces + with the last paren that actually matched
  • treats + as an empty string if there was no captures
  • maps \ in replacement to \
  • leaves unknown \x escapes in replacement untouched
  • leaves \ at the end of replacement untouched
  • taints the result if the original string or replacement is tainted
  • tries to convert pattern to a string using to_str
  • raises a TypeError when pattern can't be converted to a string
  • tries to convert replacement to a string using to_str
  • raises a TypeError when replacement can't be converted to a string
  • returns subclass instances when called on a subclass
  • sets $~ to MatchData of match and nil when there's none
  • replaces \1 with \1
  • replaces \1 with \1
  • replaces \\1 with \

String#sub with pattern and block

  • returns a copy of self with the first occurrences of pattern replaced with the block's return value
  • sets $~ for access from the block
  • restores $~ after leaving the block
  • sets $~ to MatchData of last match and nil when there's none for access from outside
  • doesn't raise a RuntimeError if the string is modified while substituting
  • doesn't interpolate special sequences like \1 for the block's return value
  • converts the block's return value to a string using to_s
  • taints the result if the original string or replacement is tainted

String#sub! with pattern, replacement

  • modifies self in place and returns self
  • taints self if replacement is tainted
  • returns nil if no modifications were made
  • raises a TypeError when self is frozen

String#sub! with pattern and block

  • modifies self in place and returns self
  • taints self if block's result is tainted
  • returns nil if no modifications were made
  • raises a RuntimeError if the string is modified while substituting
  • raises a RuntimeError when self is frozen

String#gsub with pattern and replacement

  • doesn't freak out when replacing ^
  • returns a copy of self with all occurrences of pattern replaced with replacement
  • ignores a block if supplied
  • supports \G which matches at the beginning of the remaining (non-matched) string
  • supports /i for ignoring case
  • doesn't interpret regexp metacharacters if pattern is a string
  • replaces \1 sequences with the regexp's corresponding capture
  • treats \1 sequences without corresponding captures as empty strings
  • replaces & and \0 with the complete match
  • replaces ` with everything before the current match
  • replaces ' with everything after the current match
  • replaces + with the last paren that actually matched
  • treats + as an empty string if there was no captures
  • maps \ in replacement to \
  • leaves unknown \x escapes in replacement untouched
  • leaves \ at the end of replacement untouched
  • taints the result if the original string or replacement is tainted
  • tries to convert pattern to a string using to_str
  • raises a TypeError when pattern can't be converted to a string
  • tries to convert replacement to a string using to_str
  • raises a TypeError when replacement can't be converted to a string
  • returns subclass instances when called on a subclass
  • sets $~ to MatchData of last match and nil when there's none

String#gsub with pattern and block

  • returns a copy of self with all occurrences of pattern replaced with the block's return value
  • sets $~ for access from the block
  • restores $~ after leaving the block
  • sets $~ to MatchData of last match and nil when there's none for access from outside
  • raises a RuntimeError if the string is modified while substituting
  • doesn't interpolate special sequences like \1 for the block's return value
  • converts the block's return value to a string using to_s
  • taints the result if the original string or replacement is tainted

String#gsub! with pattern and replacement

  • modifies self in place and returns self
  • taints self if replacement is tainted
  • returns nil if no modifications were made
  • raises a TypeError when self is frozen

String#gsub! with pattern and block

  • modifies self in place and returns self
  • taints self if block's result is tainted
  • returns nil if no modifications were made
  • raises a RuntimeError when self is frozen

Finished in 0.030081 seconds

2 files, 82 examples, 251 expectations, 0 failures, 0 errors

--
Magic is insufficiently advanced technology.

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0