Project

General

Profile

Actions

Feature #17771

open

String#start_with? should not construct MatchData or set $~

Added by headius (Charles Nutter) 7 months ago. Updated 7 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:103153]

Description

I am working on making $~ more thread-safe in JRuby and came across this unexpected behavior:

$ rvm ruby-3.0 do ruby -e '"foo".start_with?(/foo/); p $~'
#<MatchData "foo">

The start_with? method was added 11 years ago in https://bugs.ruby-lang.org/issues/3388 but I do not think the set of $~ was an intended feature. The start_with? method could be much faster and more thread-safe if it did not use the frame-local backref slot and did not allocate a MatchData.

Compare with match? which was added specifically (without MatchData or backref setting) to provide a fast way to check if a Regexp matches.

I propose that start_with? stop constructing MatchData, stop setting backref, and provide only its boolean result in the same way as match?.

Updated by headius (Charles Nutter) 7 months ago

I will also point out that this method, like many others, will not always set $~. If you pass a string, it remains whatever it was before:

$ rvm ruby-3.0 do ruby -e '"foo".start_with?("foo"); p $~'
nil

Avoiding the use of $~ would make this behavior consistent.

Updated by headius (Charles Nutter) 7 months ago

I see this behavior was explicitly blessed by matz in #13712 but I still believe this is not the best choice.

Around the same time as that discussion, another boolean query method match? was added that explicitly does not set the last match frame variable.

I feel this is inconsistent and the boolean query methods that accept a Regexp should be as fast as possible. If you want a MatchData use methods that provide it.

Actions #3

Updated by jeremyevans0 (Jeremy Evans) 7 months ago

  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN)
  • Tracker changed from Bug to Feature

Updated by enebo (Thomas Enebo) 7 months ago

It really feels like an unintended side-effect of the method. If you write this method and accept a variable then depending on the type of that variable there is either some MatchData (MD) as a side-effect or there isn't. This is inconsistent. If you wanted to explicitly use MD then you have to know what you are supplying. If you know it is a regexp then just writing str =~ /^my_pat/ is what you want.

Updated by headius (Charles Nutter) 7 months ago

An alternative to using str =~ /^pat/ for a start_with? that provides a MatchData would be to add a start_with that is not a boolean query method.

Updated by Eregon (Benoit Daloze) 7 months ago

I don't think there is a rule that predicate methods only return a boolean and never set $~.
It is the case for String#match vs String#match?, but it doesn't mean it holds for other Regexp methods.
I see it a bit like the use of !, which in the core library is generally only used if there is also a non-! variant (e.g., Array#delete).

String#start_with? enables to match a regexp without the need to manually build another regexp like /\A#{regexp}/ (from the user point of view, there might be internal caching depending on the regexp engine), so I think that is a valid use case for using start_with? and accessing the MatchData after.

StringScanner has a similar functionality for matching a regexp from the start, as if there was a \A, but does not expose $~ directly:
ruby -rstrscan -e 's = StringScanner.new("test string"); s.scan(/(\w)\w+/); p s[1]' => "t".

That said, I'm not against no longer setting $~ for String#start_with?, but I do worry about the compatibility issue here, especially since it might be quite hard to debug why $~ is suddenly nil or the previous MatchData in the Ruby version changing this behavior.

Updated by marcandre (Marc-Andre Lafortune) 7 months ago

I also believe it is unintended behavior and should be removed.

Actions

Also available in: Atom PDF