Feature #6802

String#scan should have equivalent yielding MatchData

Added by Ilya Vorontsov over 1 year ago. Updated over 1 year ago.

[ruby-core:46801]
Status:Assigned
Priority:Normal
Assignee:Yukihiro Matsumoto
Category:-
Target version:next minor

Description

Ruby should have method to obtain not an array of arrays but of MatchData objects. It can help in obtaining named groups:

pattern = /x: (?\d+) y:(?\d+)/
polygon = []
text.scanforpattern(pattern){|m| polygon << Point.new(m[:x], m[:y]) }

Not to break existing code we need unique name. Ideas? May be #each_match


Related issues

Related to ruby-trunk - Feature #5749: new method String#match_all needed Assigned 12/12/2011
Related to ruby-trunk - Feature #5606: String#each_match(regexp) Feedback 11/10/2011

History

#1 Updated by Ilya Vorontsov over 1 year ago

Simple implementation:

class String
def eachmatch(pattern, &block)
return Enumerator.new(self, :each
match, pattern) unless block_given?
text = self
m = text.match(pattern)
while m
yield m
text = text[m.end(0)..-1]
m = text.match(pattern)
end
end
end

#2 Updated by Benoit Daloze over 1 year ago

=begin
You can use (({String#scan})) with the block form and (({$~})) (as well as other Regexp-related globals) for this:

> text="x:1 y:12 ; x:33 y:2"
> text.scan(/x:(?<x>\d+) y:(?<y>\d+)/) { p [$~[:x],$~[:y]] }
["1", "12"]
["33", "2"]

Please check your Regexp and give an example of (({text})) next time.
=end

#3 Updated by Ilya Vorontsov over 1 year ago

Thank you for a solution! I always forgot about regexp global vars. Though I suggest that using a special method here is more clear. So what'd you say about String#eachmatch and Regexp#eachmatch
Yes, implementation is as simple as
class String
def each_match(pat)
scan(pat){ yield $~ }
end
end

and similar for Regexp.

Eregon (Benoit Daloze) wrote:

=begin
You can use (({String#scan})) with the block form and (({$~})) (as well as other Regexp-related globals) for this:

> text="x:1 y:12 ; x:33 y:2"
> text.scan(/x:(?<x>\d+) y:(?<y>\d+)/) { p [$~[:x],$~[:y]] }
["1", "12"]
["33", "2"]

Please check your Regexp and give an example of (({text})) next time.
=end

#4 Updated by Thomas Sawyer over 1 year ago

+1 I have definitely used this before (as Facets' #mscan).

#5 Updated by Benoit Daloze over 1 year ago

prijutme4ty (Ilya Vorontsov) wrote:

Though I suggest that using a special method here is more clear.
So what'd you say about String#eachmatch and Regexp#eachmatch

I did indeed somewhat expected String#scan to yield a MatchData object, instead of $~.captures.
I'm in favor of String#each_match, it might be a nice addition and the name is clear, but the naming is different from the usual regexp methods on String, and it might not be worth to add a method (I agree $~ is not the prettiest thing around).

I think Regexp#each_match does not convey well what it does though.

#6 Updated by Tomoaki Nishiyama over 1 year ago

+1 to have a method to return MatchData.
This is related to (or duplicate of) #5749 and #5606.

Even with the simple implementation I think to establish a standard
name and specification.

#7 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Open to Assigned
  • Assignee set to Yukihiro Matsumoto
  • Target version set to next minor

Also available in: Atom PDF