Project

General

Profile

Feature #6802

String#scan should have equivalent yielding MatchData

Added by prijutme4ty (Ilya Vorontsov) over 7 years ago. Updated almost 2 years ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:46801]

Description

Ruby should have method to obtain not an array of arrays but of MatchData objects. It can help in obtaining named groups:

pattern = /x: (?\d+) y:(?\d+)/
polygon = []
text.scan_for_pattern(pattern){|m| polygon << Point.new(m[:x], m[:y]) }

Not to break existing code we need unique name. Ideas? May be #each_match


Related issues

Related to Ruby master - Feature #5749: new method String#match_all neededAssignedActions
Related to Ruby master - Feature #5606: String#each_match(regexp)FeedbackActions
Related to Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a StringFeedbackActions

History

Updated by prijutme4ty (Ilya Vorontsov) over 7 years ago

Simple implementation:

class String
def each_match(pattern, &block)
return Enumerator.new(self, :each_match, pattern) unless block_given?
text = self
m = text.match(pattern)
while m
yield m
text = text[m.end(0)..-1]
m = text.match(pattern)
end
end
end

Updated by Eregon (Benoit Daloze) over 7 years ago

=begin
You can use (({String#scan})) with the block form and (({$~})) (as well as other Regexp-related globals) for this:

> text="x:1 y:12 ; x:33 y:2"
> text.scan(/x:(?<x>\d+) y:(?<y>\d+)/) { p [$~[:x],$~[:y]] }
["1", "12"]
["33", "2"]

Please check your Regexp and give an example of (({text})) next time.
=end

Updated by prijutme4ty (Ilya Vorontsov) over 7 years ago

Thank you for a solution! I always forgot about regexp global vars. Though I suggest that using a special method here is more clear. So what'd you say about String#each_match and Regexp#each_match
Yes, implementation is as simple as
class String
def each_match(pat)
scan(pat){ yield $~ }
end
end

and similar for Regexp.

Eregon (Benoit Daloze) wrote:

=begin
You can use (({String#scan})) with the block form and (({$~})) (as well as other Regexp-related globals) for this:

> text="x:1 y:12 ; x:33 y:2"
> text.scan(/x:(?<x>\d+) y:(?<y>\d+)/) { p [$~[:x],$~[:y]] }
["1", "12"]
["33", "2"]

Please check your Regexp and give an example of (({text})) next time.
=end

Updated by trans (Thomas Sawyer) over 7 years ago

+1 I have definitely used this before (as Facets' #mscan).

Updated by Eregon (Benoit Daloze) over 7 years ago

prijutme4ty (Ilya Vorontsov) wrote:

Though I suggest that using a special method here is more clear.
So what'd you say about String#each_match and Regexp#each_match

I did indeed somewhat expected String#scan to yield a MatchData object, instead of $~.captures.
I'm in favor of String#each_match, it might be a nice addition and the name is clear, but the naming is different from the usual regexp methods on String, and it might not be worth to add a method (I agree $~ is not the prettiest thing around).

I think Regexp#each_match does not convey well what it does though.

Updated by tomoakin (Tomoaki Nishiyama) over 7 years ago

+1 to have a method to return MatchData.
This is related to (or duplicate of) #5749 and #5606.

Even with the simple implementation I think to establish a standard
name and specification.

Updated by mame (Yusuke Endoh) about 7 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)
  • Target version set to 2.6
#8

Updated by naruse (Yui NARUSE) almost 2 years ago

  • Target version deleted (2.6)
#9

Updated by shyouhei (Shyouhei Urabe) about 1 year ago

  • Related to Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a String added

Also available in: Atom PDF