Project

General

Profile

Bug #17184

No stdlib function to perform simple string replacement

Added by sheerun (Adam Stankiewicz) 28 days ago. Updated 26 days ago.

Status:
Open
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-darwin19]
[ruby-core:100099]

Description

I have following simple build.rb:

template = File.read('template.vim')
script = File.read('script.vim')
File.write('app.vim', template.gsub("SCRIPT", script))

And then following template.vim:

" some header
SCRIPT

Plus following script.vim:

if g:something =~ "\s\+"
  echo 'g:something is empty'
endif

I'd expect that the script above produces app.vim with following contents:

" some header
if g:something =~ "\s\+"
  echo 'g:something is empty'
endif

Unfortunately it produces following:

" some header
if g:something =~ "\s"
  echo 'g:something is empty'
endif

It's probably because gsub interprets \+ in script as back-reference.

I tried to find replacement function in ruby that just replaces one string with something else, without interpreting replacement in any way, but surprisingly I haven't found any.. Am I mistaken?

Updated by briankung (Brian Kung) 28 days ago

The hash replacement form of gsub doesn't have this problem, strangely enough:

'mislocated cat, vindicating'.gsub('cat', 'dog\+')
#=> "mislodoged dog, vindidoging"
'mislocated cat, vindicating'.gsub('cat', 'cat' => 'dog\+')
#=> "mislodog\\+ed dog\\+, vindidog\\+ing"

Updated by byroot (Jean Boussier) 28 days ago

The intended API for that is String[]=, e.g.

string["search"] = "replace"

Updated by sheerun (Adam Stankiewicz) 28 days ago

byroot (Jean Boussier) wrote in #note-2:

The intended API for that is String[]=, e.g.

Is there a non-mutable version of this?

Updated by sheerun (Adam Stankiewicz) 28 days ago

For now I'm using self-implemented function, but I wish there was something built-in:

class String
  # Simple substitution, ignores backreferences in sub
  # https://bugs.ruby-lang.org/issues/17184
  def ssub(str, sub)
    new_str = String.new(self, capacity: self.size - str.size + sub.size)
    new_str[str] = sub
    new_str
  end
end

Updated by sheerun (Adam Stankiewicz) 28 days ago

Little more advanced, but common usecase, which also doesn't have built-in function is replacing all strings in simple way (i.e. gsub replacement, not sub replacement).

Updated by Eregon (Benoit Daloze) 28 days ago

chrisseaton (Chris Seaton) noticed the same issue a while ago.

String#[]= only replaces the first occurrence (and mutates the receiver).

I think we should remove special treatment of \+, etc in the replacement string for sub/gsub(String, String).
There is no Regexp involved, so I think there is no reason to have those Regexp backreferences.
sub/gsub(String, String) only has a single "group", which is 0 and exactly the same as the replacement String, so those backreferences seem useless.

Here is the list of those special \ sequences:
https://github.com/oracle/truffleruby/blob/6bb23e236f83416fc4a000b6f3cf976947117ed7/src/main/ruby/truffleruby/core/truffle/string_operations.rb#L187-L246

Potentially incompatible, but who would use those sequences for sub/gsub(String, String)?
I think it's always unintentional, and a source of subtle bugs.

And worse than that, I can imagine it could become a security issue if the pattern String is fixed and the replacement String comes from user input.
Then the user would expect the pattern String cannot be seen in the result, but it can if the replacement String is \&.

sheerun (Adam Stankiewicz) I'd suggest to not bother with a capacity computation, that's very unidiomatic. You can use new_str = str.dup instead.

Updated by sheerun (Adam Stankiewicz) 28 days ago

String doesn't have dup method

Updated by znz (Kazuhiro NISHIYAMA) 28 days ago

You can use gsub("SCRIPT") { script }.

If you do not want the special treatment of replacement, you should use gsub with block instead of gsub(pat, replacement).

Updated by sheerun (Adam Stankiewicz) 28 days ago

Nice, it actually suits me.

I'll leave this issue open though, because I agree with what Benoit said above

Updated by Eregon (Benoit Daloze) 28 days ago

sheerun (Adam Stankiewicz) wrote in #note-7:

String doesn't have dup method

What do you mean? "foo".dup # => "foo".

Updated by Eregon (Benoit Daloze) 28 days ago

znz (Kazuhiro NISHIYAMA) wrote in #note-8:

You can use gsub("SCRIPT") { "replacement" }.

I forgot about this form for this purpose. I find it really not intuitive for replacing strings.
A block is typically useful for more custom/complicated logic, but here ironically it has more simple semantics than the replacement String argument.

Updated by Dan0042 (Daniel DeLorme) 27 days ago

I think we should remove special treatment of \+, etc in the replacement string for sub/gsub(String, String).
There is no Regexp involved, so I think there is no reason to have those Regexp backreferences.

I agree that would be the sensible behavior, but I'm not sure the incompatibility is worth it. Imagine you try "foo".gsub("o", '\+') expecting to get f\+\+ as a result. Instead you get just f, and after a lot of searching and debugging you end up with "foo".gsub("o", '\\\\+') which produces the correct result. If we were to change the behavior, the result will change to f\\+\\+. I'm sure there's a lot of ruby code out there that use a replacement string with a ton of backslashes to avoid this issue. Breaking all that code is not a good idea imho.

Updated by byroot (Jean Boussier) 27 days ago

IMHO, rather than change the behavior of gsub, introducing a much simpler String#replace(search, replace) would make more sense, and would be more discoverable.

Updated by Eregon (Benoit Daloze) 26 days ago

  • Assignee set to matz (Yukihiro Matsumoto)

Dan0042 (Daniel DeLorme) wrote in #note-12:

Instead you get just f, and after a lot of searching and debugging you end up with "foo".gsub("o", '\\\\+') which produces the correct result. If we were to change the behavior.

Are there examples of such code?
It seems hard to read and a worse version of "foo".gsub("o") { '\+' } which is more general.
I would guess it's so rare that this is definitely worth the tiny corner-case incompatibility.
And I would think it would reveal more unintended strange replacement cases than actually breaking such manually-escaped constant string cases.

byroot (Jean Boussier) wrote in #note-13:

IMHO, rather than change the behavior of gsub, introducing a much simpler String#replace(search, replace) would make more sense, and would be more discoverable.

But then we need to clarify if that replaces the first or all occurrences (and String#[]= is hardly intuitive or related by name).

sub/gsub already have optimizations for (String, String) cases (notably, they don't use a Regexp internally).
It seems sad to not simply reuse them when the intended semantics are very clear.
It seems a huge pitfall to me that sub/gsub(String, String) will not use the replacement String as-is. A new method does not address that.

matz (Yukihiro Matsumoto) Can we experiment with that change? I expect extremely few incompatibilities.
I expect it will fix many usages which do not expect any magic backreferences replacement.

Does anyone know why this behavior exists in the first place?
I guess maybe because sub/gsub(String, String) were once implemented by making a Regexp for the pattern and that scanned the replacement String for backreferences, causing this strange behavior?

Updated by byroot (Jean Boussier) 26 days ago

But then we need to clarify if that replaces the first or all occurrences

As far as I know, String replace commonly means replace all occurrences.

So all in all I don't think it's a huge problem, the least surprising IMHO would be replace all occurrence, and I find Python's optional maxreplace quite a good idea.

Updated by Eregon (Benoit Daloze) 26 days ago

There is already String#replace (consistent with {Array,Hash,Set,...}#replace), so I don't think it's a good fit:

https://docs.ruby-lang.org/en/master/String.html#method-i-replace

Updated by byroot (Jean Boussier) 26 days ago

There is already String#replace

Oh wow, I somehow missed that...

Also available in: Atom PDF