Project

General

Profile

Actions

Feature #18757

open

Introduce %R percent literal for anchored regular expression patterns

Added by zeke (Zeke Gabrielse) almost 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:108416]

Description

When defining regular expression patterns, it's often the case that you want to anchor with \A and \z to match the full text input, rather than ^ and $, respectively, which may (unintentionally) match text including newlines. This is especially true in the context of a web application such as a Rails app. Unfortunately, \A and \z reduce the legibility of a regular expression.

For example, take this ActionMailbox usage:

class ApplicationMailbox < ActionMailbox::Base
  routing %r{\Areplies\+.*?@ruby-lang\.org\z}i => :replies
  routing %r{\Asales@.*?\z}i                   => :leads
end

At first glance, it may look as if the second route matches Asales, but that's not the case upon further inspection. To improve legibility, a developer may choose to use ^ instead of \A. Because when defining a pattern using \A and \z, readability suffers, but especially for \A. In other cases, developers forget to use \A and \z over ^ or $ when validating or matching against user input.

I propose Ruby introduces a new percent-notation, %R{}, for defining interpolated regular expression patterns that automatically anchor a pattern with \A and \z.

For example, the above will look like below:

class ApplicationMailbox < ActionMailbox::Base
  routing %R{replies\+.*?@ruby-lang\.org}i => :replies
  routing %R{sales@.*?}i                   => :leads
end

This is much more readable, and it's safer — developers using %R{} are not going to accidentally use ^ or $ instead of \A and \z, respectively (the former being vulnerable to matching input data containing newlines).

This is especially useful in pattern matching data where some values may be a symbol or a string, depending on where the data originated (internally vs externally):

data = { type: :foo, id: 1 } # Could also be: { type: 'foo', id: 1 }

case data
in type: %R(foo), id:
  # ...
else
end

Formally, the new anchored regex percent notation would work as follows:

re = %R(test)
# => /\Atest\z/

re.match?('test')    # => true
re.match?('testing') # => false
re.match?('a test')  # => false
re.match?(:test)     # => true
re.match?(:testing)  # => false
re.match?(:a_test)   # => false

This would also be useful for data validation purposes, where a developer could clean up patterns that previously used regular expressions with \A...\z and ^...$, such as with Rails model validations, e.g. validates_format(with: %R{[-a-z0-9]+})

I do understand that having an uppercase %R behaves differently than other percent notations (i.e. lowercase is typically non-interpolated, uppercase interpolated), but since %r already allows interpolation, I figured it was okay to be a bit different. Regardless — I'm open to other syntax suggestions.

Updated by zeke (Zeke Gabrielse) almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by zeke (Zeke Gabrielse) almost 2 years ago

  • Description updated (diff)
Actions #3

Updated by zeke (Zeke Gabrielse) almost 2 years ago

  • Description updated (diff)
Actions #4

Updated by zeke (Zeke Gabrielse) almost 2 years ago

  • Subject changed from Introduce %R for anchored regular expression patterns to Introduce %R percent literal for anchored regular expression patterns
Actions #5

Updated by zeke (Zeke Gabrielse) almost 2 years ago

  • Description updated (diff)

Updated by jrochkind (jonathan rochkind) almost 2 years ago

I do find \A and \z cumbersome and confusing for a common use case. (You didn't mention the need to avoid getting confused with \Z and \z too!).

Instead of new syntax, how about just a new stdlib method, Regexp.anchored(/whatever/), that would simply add left/right anchoring? Just ordinary new method.

Alternately, I suppose I could see a new flag on the end of /whatever/a (for (a)nchored). Not sure if adding a new flag has issues. (Not totally sure if a is already used or not).

Adding new features without adding new syntax is preferable to adding new syntax.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0