Feature #15446: Add a method `String#each_match` to the Ruby core - Ruby - Ruby Issue Tracking System

Feature #15446

Updated by CaryInVictoria (Cary Swoveland) over 7 years ago

`String#each_match` `String#matches` would have two forms: 

 *each_match(pattern) { |match| block } → str* 
 *each_match(pattern) → an_enumerator* 

 The latter would be identical to an alias of the form *gsub(pattern) "gsub(pattern) → enumerator* enumerator" of [String#gsub](http://ruby-doc.org/core-2.5.1/String.html#method-i-gsub). The former would simply yield the matches to a block and return the receiver. 

  

 I frequently use the this form of `gsub` that returns an enumerator instead of `scan` when chaining to Enumerable methods. That's because `gsub` returns an enumerator whereas `scan` returns an unneeded a temporary array. This use of `gsub` can also be useful when the pattern contains capture groups, which can be is sometimes a complication when using `scan`, `scan` (such as in the following example when a capture group is needed for back-referencing). 

 Suppose we are given Here is a string and wish to count the number simple example of occurrences of each word that begins and ends with the same letter (case-insensitive). 

      its use. 

     str = "Viv and Bob are party animals. Bob and Eve are a couple who met on Christmas Eve. Bob is a regular guy." 

      r = /\b(?:[a-z]|([a-z])[a-z]*\1)\b/i 

 This regular expression reads, "match a word break, followed by one letter or by two or more letters "Tina was friends with the last matching the first (case insensitive), all followed by a word break". 

      enum = str.each_match(r) 
         #=> #<Enumerator: "Viv Mary and Bob are party...a regular guy.":gsub(/\b(?:[a-z]|([a-z])[a-z]*\1)\b/i)>  
 
 We can convert `enum` to an array to see the words that will be generated by the enumerator Sue. Tina and passed Mary loved to the block. 

     enum.to_a 
         #=> ["Viv", "Bob", "Bob", "Eve", "a", "Eve", "Bob", "a", "regular"]  

 Continuing,  

     enum.each_with_object(Hash.new(0)) { |word, h| h[word] += 1 } 
        #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1}  

 We could alternatively use `each_match` with a block. 

      h = Hash.new(0) 
      str.each_match(r) { |word| h[word] += 1 } 
         #=> "Viv party. Sue and Bob are party animals. Bob and Eve are a couple who met on Christmas Eve. Bob is a regular guy." 
      h #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1}  

 This form of `each_match` has no counterpart with `gsub`. 

 Consider now how `scan` would be used here. Because of the way `scan` treats capture groups, we cannot write Tina went bowling every Thursday." 

     str.scan(r) 
        #=> [["V"], ["B"], ["B"], ["E"], [nil], ["E"], ["B"], [nil], ["r"]]  

 Instead we must add a second capture group. 

     arr = str.scan(/\b((?:[a-z]|([a-z])[a-z]*\2))\b/i) 
        #=> [["Viv", "V"], ["Bob", "B"], ["Bob", "B"], ["Eve", "E"], ["a", nil], ["Eve", "E"], ["Bob", "B"], ["a", nil], ["regular", "r"]] 

 Then 

     arr.each_with_object(Hash.new(0)) str.gsub(/\b(?:Tina|Mary|Sue)\b/).each_with_object(Hash.new(0)) { |(word,_),h| h[word] |p,h| h[p] += 1 } 
        #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1} 

 This works but it's a bit of a [dog's breakfast](https://dictionary.cambridge.org/us/dictionary/english/a-dog-s-breakfast) when compared to the use of the proposed method. 

 {"Tina"=>3, "Mary"=>2, "Sue"=>2}   
 
 The problem with using `gsub` in this way is that it is confusing to readers who are expecting character substitutions to be performed. I also believe that the name of this method (the "sub" in `gsub`) has resulted in the form of the method that returns an enumerator to be under-appreciated and under-used. 

 Some comments below propose Again, I am proposing that this suggestion an alias be adopted and, in time, provided for the form of `gsub` that returns an enumerator be deprecated. 
 enumerator. I suggest `String#matches`, but the choice of name is secondary.

Back

Project

General

Profile

Ruby

Feature #15446