Feature #18757: Introduce %R percent literal for anchored regular expression patterns - Ruby - Ruby Issue Tracking System

Feature #18757

Updated by zeke (Zeke Gabrielse) over 3 years ago

When defining regular expression patterns, it's often the case that you want to anchor with `\A` and `\z` to match the full text input, rather than `^` and `$`, respectively, which may (unintentionally) match text including newlines. This is especially true in the context of a an web application such as a Rails app. Unfortunately, `\A` and `\z` reduce the legibility of a regular expression. 

 For example, take this `ActionMailbox` usage: 

 ```ruby 
 class ApplicationMailbox < ActionMailbox::Base 
   routing %r{\Areplies\+.*?@ruby-lang\.org\z}i => :replies 
   routing %r{\Asales@.*?\z}i                     => :leads 
 end 
 ``` 

 At first glance, it may look as if the second route matches `Asales`, but that's not the case upon further inspection. To improve legibility, a developer may choose to use `^` instead of `\A`. Because when defining a pattern using `\A` and `\z`, readability suffers, but especially for `\A`. In other cases, developers forget to use `\A` and `\z` over `^` or `$` when validating or matching against user input. 

 I propose Ruby introduces a new percent-notation, `%R{}`, for defining interpolated regular expression patterns that automatically anchor a pattern with `\A` and `\z`. 

 For example, the above will look like below: 

 ```ruby 
 class ApplicationMailbox < ActionMailbox::Base 
   routing %R{replies\+.*?@ruby-lang\.org}i => :replies 
   routing %R{sales@.*?}i                     => :leads 
 end 
 ``` 

 This is much more readable, and it's safer — developers using `%R{}` are not going to accidentally use `^` or `$` instead of `\A` and `\z`, respectively (the former being vulnerable to matching input data containing newlines). 

 This is especially useful in pattern matching data where some values may be a symbol or a string, depending on where the data originated (internally vs externally): 

 ```ruby 
 data = { type: :foo, id: 1 } # Could also be: { type: 'foo', id: 1 } 

 case data 
 in type: %R(foo), id: 
   # ... 
 else 
 end 
 ``` 

 Formally, the new anchored regex percent notation would work as follows: 

 ```ruby 
 re = %R(test) 
 # => /\Atest\z/ 

 re.match?('test')      # => true 
 re.match?('testing') # => false 
 re.match?('a test')    # => false 
 re.match?(:test)       # => true 
 re.match?(:testing)    # => false 
 re.match?(:a_test)     # => false 
 ``` 

 This would also be useful for data validation purposes, where a developer could clean up patterns that previously used regular expressions with `\A...\z` and `^...$`, such as with Rails model validations, e.g. `validates_format(with: %R{[-a-z0-9]+})` 

 I do understand that having an uppercase `%R` behaves differently than other percent notations (i.e. lowercase is typically non-interpolated, uppercase interpolated), but since `%r` already allows interpolation, I figured it was okay to be a bit different. Regardless — I'm open to other syntax suggestions.

Back

Project

General

Profile

Ruby

Feature #18757