Project

General

Profile

Actions

Feature #13890

open

Allow a regexp as an argument to 'count', to count more interesting things than single characters

Added by duerst (Martin Dürst) over 6 years ago. Updated about 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:82743]

Description

Currently, String#count only accepts strings, and counts all the characters in the string.

However, I have repeatedly met the situation where I wanted to count more interesting things in strings.
These 'interesting things' can easily be expressed with regular expressions.

Here is a quick-and-dirty Ruby-level implementation:

class String
  alias old_count count

  def count (what)
    case what
    when String
      old_count what
    when Regexp
      pos = -1
      count = 0
      count += 1 while pos = index(what, pos+1)
      count
    end
  end
end

Please note that the implementation counts overlapping occurrences; maybe there is room for an option like overlap: :no.


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #12698: Method to delete a substring by regex matchFeedbackActions

Updated by Eregon (Benoit Daloze) over 6 years ago

Should it behave the same as str.scan(regexp).size ?

I think the default should be no overlap, and increment the position by the length of the match.

Updated by duerst (Martin Dürst) over 6 years ago

Eregon (Benoit Daloze) wrote:

I think the default should be no overlap, and increment the position by the length of the match.

That would be fine by me, too.

Updated by duerst (Martin Dürst) about 6 years ago

Python allows to count strings, as follows:

str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

Actions #4

Updated by duerst (Martin Dürst) over 5 years ago

  • Related to Feature #12698: Method to delete a substring by regex match added

Updated by shan (Shannon Skipper) about 1 year ago

I'd love to have this feature. A str.count(regexp) is something I see folk trying fairly often. A str.count(regexp) also avoids the intermediary Array of str.scan(regexp).size or the back bending with str.enum_for(:scan, regexp).count.

Updated by matz (Yukihiro Matsumoto) about 1 year ago

If str.count(re) works as str.scan(re).size (besides efficiency), it's acceptable. But if someone needs overlapping, they needs to explain their use-case.

Matz.

Updated by sawa (Tsuyoshi Sawada) about 1 year ago

Overlapping can be realized by putting the original regexp within a look-ahead.

s = "abcdefghij"
re = /.{3}/

Non-overlapping count:

s.scan(re).count # => 3
s.count(re) # => Expect 3

Overlapping count:

s.scan(/(?=#{re})/).count # => 8
s.count(/(?=#{re})/) # => Expect 8

So I do not think there is any need to particularly implement overlapping as a feature of this method.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like1Like0Like0Like0