Feature #20266
openNew syntax to escape embed strings in Regexp literal
Description
Premise¶
When using embed strings in Regexp literal, it is interpreted as a part of the Regexp.
foo = "[a-z]"
p /#{foo}/ #=> /[a-z]/
So, currently we often have to escape the embed strings.
foo = "[a-z]"
p /#{Regexp.quote(foo)}/ #=> /\[a\-z\]/
This is very long and painful to write every time.
So, I propose new syntax to escape embed strings automatically.
Proposal¶
Adding new token #{=
in Regexp literal:
foo = "[a-z]"
p /#{=foo}/ #=> /\[a\-z\]/
When #{=
is used instead of #{
, ruby calls Regexp.quote
internally.
Compatibility¶
Current ruby causes syntax error when using #{=
, then there is no incompatibilty.
Out of scope of this proposal¶
I do not propose about #{=
in another literals. They are out of scope of this proposal.
Updated by mrkn (Kenta Murata) 11 months ago
I agree with this proposal. Even if Ruby enables \Q
and \E
features in Onigumo, they don't work as expected if the embedded string contains \E
. Therefore, it would be better for Ruby to have a short syntax for #{Regexp.quote(str)}
.
Updated by knu (Akinori MUSHA) 11 months ago
I was also part of the discussion circle regarding this idea. The lack of support for easily escaping a string for regular expressions has led users to often omit it when it seems obvious that a string does not need escaping (for example, when it is alphanumeric) or when it "looks" practically okay to do so. However, omitting escaping for something like a domain name could potentially create a vulnerability since the dot is a meta character.
Consider the scenario where the variable hostname
is set to "example.co.jp"
. In the expression %r{\Ahttps://#{hostname}/}.match?(callback_url)
where necessary escaping is omitted, it unwantedly matches "https://example.co/jp/…"
which is a URL under a completely different domain.
That's why I believe it is necessary for Ruby to provide an easy and readable way to escape a string in interpolation. It would help code reviewers and reviewees a lot if escaping costed just one character, whereas "Add Regexp.quote() here and here" can look scary and pedantic.
Updated by shan (Shannon Skipper) 11 months ago
I wonder if #{^foo}
might be a passable alternative for #{=foo}
since "pinning" almost makes sense and and uptick is less likely to actually be intended than an equals sign to start a quoted interpolation?
Updated by rubyFeedback (robert heiler) 11 months ago
I don't have any pro or con opinion on the feature itself; in regards to ^foo versus =foo, I think users may wonder about both:
^ specifically because many regexes may have it, such as /^foobar/, and with = they may assume some assignment to be made. At the least that was my first impression when seeing it, perhaps inspired by erb.
I guess IF the rationale is that Regexp.quote(i) is too long to type, which seems a reasonable statement, then it makes sense to use a shorthand syntax. But probably all shorthand syntaxes here may not be "perfect". Remember the perl-inspired $ variables; not everyone can remember them easily. (Unfortunately the longer $ named variables weren't a big improvement either.)
Just for sake of completion, as this was already discussed, could someone show alternative syntax suggestions, if they were made? Just so we can more easily compare the preferred variant over the other variants, e. g. two so far, even if one may be "inofficial" by shan:
#{^foo}
#{=foo}
I'll also try shan's suggestion via the first one, as "side-by-side" comparison:
foo = "[a-z]"
p /#{=foo}/
foo = "[a-z]"
p /#{^foo}/
Hmm. And my initial thought of the second one used with leading ^
p /^#{^foo}/
And for comparison the other one also with leading ^:
p /^#{=foo}/
I think none of them will win any beauty contest, but it could still be
interesting for a comparison.
Updated by Dan0042 (Daniel DeLorme) 11 months ago
TBH I'm not entirely sure it's worth new syntax, but I've definitely felt the verbosity of Regexp.escape
before, and I like how #{= expr}
has similarity with erb's <%= expr %>
Updated by matheusrich (Matheus Richard) 11 months ago
I wonder if this new syntax would open the doors to adding some kind of similar behavior to normal string interpolation too.