Bug #15895


String#gsub and String#sub should return original string if no substitution(s) have been made

Added by ashmaroli (Ashwin Maroli) over 2 years ago. Updated over 2 years ago.

Target version:


Currently if one were to call 'Hello World'.gsub(/[<&>]/, html_entities_hash) , a copy of 'Hello World' is allocated and returned. If such a call were to occur inside a loop, then that would cause numerous copies to be allocated simply from an attempt at substitution.

Likewise for 'Hello World'.sub(/\d+/, 'x')

Opting for the destructive alternatives is not possible since the original string should remain unchanged in all cases.

IMO, it'd be great to have the original string returned if substitution(s) couldn't be made.

Actions #1

Updated by ashmaroli (Ashwin Maroli) over 2 years ago

  • Description updated (diff)

Updated by k0kubun (Takashi Kokubun) over 2 years ago

  • Status changed from Open to Feedback

I can understand why you want it, but that would be a breaking change for existing code which assumes that gsub always creates a new String instance and performs a destructive operation on the gsub's return value. In that code the original string is suddenly modified once gsub behavior is changed as such.
If you really need it, please consider proposing an option to force the behavior or another method.

Besides, please share a code which reflects your real-world use case which has gsub in a loop. The code and its benchmark results may encourage us to introduce such a feature. It may not be a bottleneck for your application and in that case the feature could be just a useless micro optimization. At least, I don't think we write a code like 'Hello World'.sub(/\d+/, 'x'), and also you don't need to write 'Hello World'.gsub(/[<&>]/, html_entities_hash) because I wrote the fast HTML escape method in Ruby core which is NOT using gsub

Updated by ashmaroli (Ashwin Maroli) over 2 years ago

Thank you for pointing me to CGI.escape_html. I was not aware of it being faster than the gsub route. Moreover, you're right about the gsub usage not being a bottleneck in my codebase.

It simply resulted in numerous string allocations (as reported by the memory_profiler gem) that I wanted to reduce or eliminate.

As it turns out, CGI.escape_html allocates lesser Ruby strings than gsub. So, it is a win-win solution for me.

Thank you once again.

Actions #4

Updated by ashmaroli (Ashwin Maroli) over 2 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF