Project

General

Profile

Actions

Feature #5749

open

new method String#match_all needed

Added by yimutang (Joey Zhou) almost 10 years ago. Updated almost 4 years ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:41603]

Description

The String class should contain an instance method 'match_all', which is a mixture of 'match' and 'scan'.

The method 'scan' is not a very powerful tool, its result(the yielding thing) is just a matched string or an array of captured strings.

p 'a1bc2de3f'.scan(/(.)\d(.)/) # "a", "b"], ["c", "d"], ["e", "f"

If the regex argument contains groups, I even cannot get the whole matched string, and no information about the matched offsets.

So, a 'match_all' is very necessary. It scan the string, finding every matched, and yielding MatchData instance to the following block.

Here's a simple implemention in Ruby:

class String
def match_all(re,i=0)
if block_given?
while m = self.match(re,i)
yield m
i = m.end(0)
end
return self
else
ary = []
while m = self.match(re,i)
ary << m
i = m.end(0)
end
return ary
end
end
end

However, it is not efficient in the 'while m = self.match(re,i)' way, because it scan the string again and again. If string is UTF8-encoded and contains out-of-ASCII characters, I'm afraid getting the start index of it is so expensive.

So, I think a built-in 'match_all' method, which behaves just like 'scan' but yield MatchData, is needed.

Please consider it, thank you!


Related issues

Related to Ruby master - Feature #5606: String#each_match(regexp)FeedbackActions
Related to Ruby master - Feature #6802: String#scan should have equivalent yielding MatchDataAssignedmatz (Yukihiro Matsumoto)Actions
Related to Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a StringFeedbackmatz (Yukihiro Matsumoto)Actions

Updated by naruse (Yui NARUSE) almost 10 years ago

Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?

Updated by yimutang (Joey Zhou) almost 10 years ago

Yui NARUSE wrote:

Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?

You reminds me! Yes, what I want can be done in this tricky way. Thank you!

However, I think relying on these special global variables is just an expedient.

If there is an explicit method, it's much more user-friendly and readable.

When I wanted the function, what I did is to consult the API, attempting to find a proper method, not thinking how to play with those magic punctuation. Maybe most people is just like me...

Updated by trans (Thomas Sawyer) almost 10 years ago

If memory serves Facets has #mscan method.

Updated by tomoakin (Tomoaki Nishiyama) almost 10 years ago

I proposed a similar one as each_match
http://bugs.ruby-lang.org/issues/5606

A difference is to have the next offset by
m.begin(0)+1
rather than m.end(0)

"AKASATANA".each_match(/A.A/)

will recognize AKA ASA ATA ANA
(This, I think, cannot be done with scan. Is it?)

Such different behavior might be controlled with an optional argument.
I think we might merge the discussion to this issue
rather than keeping too separate issues.

Anyway, I'm glad to hear a similar demand for a function to get the MatchData
objects, rather than scan() to set the trick.

Updated by mame (Yusuke Endoh) over 9 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)
Actions #6

Updated by mame (Yusuke Endoh) almost 9 years ago

  • Target version set to 2.6
Actions #7

Updated by naruse (Yui NARUSE) almost 4 years ago

  • Target version deleted (2.6)
Actions #8

Updated by shyouhei (Shyouhei Urabe) almost 3 years ago

  • Related to Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a String added
Actions

Also available in: Atom PDF