Feature #5749

new method String#match_all needed

Added by Joey Zhou over 2 years ago. Updated over 1 year ago.

[ruby-core:41603]
Status:Assigned
Priority:Normal
Assignee:Yukihiro Matsumoto
Category:-
Target version:next minor

Description

The String class should contain an instance method 'match_all', which is a mixture of 'match' and 'scan'.

The method 'scan' is not a very powerful tool, its result(the yielding thing) is just a matched string or an array of captured strings.

p 'a1bc2de3f'.scan(/(.)\d(.)/) # [["a", "b"], ["c", "d"], ["e", "f"]]

If the regex argument contains groups, I even cannot get the whole matched string, and no information about the matched offsets.

So, a 'match_all' is very necessary. It scan the string, finding every matched, and yielding MatchData instance to the following block.

Here's a simple implemention in Ruby:

class String
def matchall(re,i=0)
if block
given?
while m = self.match(re,i)
yield m
i = m.end(0)
end
return self
else
ary = []
while m = self.match(re,i)
ary << m
i = m.end(0)
end
return ary
end
end
end

However, it is not efficient in the 'while m = self.match(re,i)' way, because it scan the string again and again. If string is UTF8-encoded and contains out-of-ASCII characters, I'm afraid getting the start index of it is so expensive.

So, I think a built-in 'match_all' method, which behaves just like 'scan' but yield MatchData, is needed.

Please consider it, thank you!


Related issues

Related to ruby-trunk - Feature #5606: String#each_match(regexp) Feedback 11/10/2011
Related to ruby-trunk - Feature #6802: String#scan should have equivalent yielding MatchData Assigned 07/27/2012

History

#1 Updated by Yui NARUSE over 2 years ago

Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?

#2 Updated by Joey Zhou over 2 years ago

Yui NARUSE wrote:

Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?

You reminds me! Yes, what I want can be done in this tricky way. Thank you!

However, I think relying on these special global variables is just an expedient.

If there is an explicit method, it's much more user-friendly and readable.

When I wanted the function, what I did is to consult the API, attempting to find a proper method, not thinking how to play with those magic punctuation. Maybe most people is just like me...

#3 Updated by Thomas Sawyer over 2 years ago

If memory serves Facets has #mscan method.

#4 Updated by Tomoaki Nishiyama over 2 years ago

I proposed a similar one as each_match
http://bugs.ruby-lang.org/issues/5606

A difference is to have the next offset by
m.begin(0)+1
rather than m.end(0)

"AKASATANA".each_match(/A.A/)

will recognize AKA ASA ATA ANA
(This, I think, cannot be done with scan. Is it?)

Such different behavior might be controlled with an optional argument.
I think we might merge the discussion to this issue
rather than keeping too separate issues.

Anyway, I'm glad to hear a similar demand for a function to get the MatchData
objects, rather than scan() to set the trick.

#5 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Yukihiro Matsumoto

#6 Updated by Yusuke Endoh over 1 year ago

  • Target version set to next minor

Also available in: Atom PDF