Bug #9695
Substring search exhibit quadratic behaviour.
Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
Backport:
Description
12.upto(21).map do |i|
needle='a'*(2**(i-1)) + 'b'
haystack = 'a' * (2**i)
a = Time.now; haystack.include?(needle); b=Time.now
t = b-a
p [t, t/haystack.length**2]
end
gives
[0.000237363, 1.4147937297821046e-11] [0.000851294, 1.2685269117355347e-11] [0.003081808, 1.1480629444122315e-11] [0.013984239, 1.3023837469518185e-11] [0.035659327, 8.302584057673811e-12] [0.136899401, 7.968593912664801e-12] [0.545703233, 7.941027186461724e-12] [2.217092242, 8.065734589763452e-12] [8.965134534, 8.15374235935451e-12] [36.113581501, 8.211277759301083e-12] =>
Wasn't me that found it but it seems the founder hasn't reported it yet.
History
#1
[ruby-core:61815]
Updated by shyouhei (Shyouhei Urabe) about 4 years ago
- Category set to core
Interesting, and I'm pretty sure this also interests matz.
#2
[ruby-core:61839]
Updated by nobu (Nobuyoshi Nakada) about 4 years ago
- Description updated (diff)
It's O(m*n)
, not O(m**2)
, where m
and n
are the lengths of the receiver and the searching strings respectively.
#3
[ruby-core:61869]
Updated by naruse (Yui NARUSE) about 4 years ago
- Status changed from Open to Rejected
The algorithm is Sunday's quick search, which is a variation of BM, whose cost is O(m*n) as nobu said.
The benchmark happens the worst case for BM variants.
#4
[ruby-core:61903]
Updated by naruse (Yui NARUSE) about 4 years ago
For example other BM implementation, I compared with Ruby's regexp:
12.upto(21).map do |i|
needle='a'*(2**(i-1)) + 'b'
haystack = 'a' * (2**i)
a = Time.now; haystack.include?(needle); b=Time.now
t1 = b-a
a = Time.now; Regexp.new(needle) =~ haystack; b=Time.now
t2 = b-a
p [i, t1, t1/haystack.length**2, t2, 21/haystack.length**2]
end
[12, 0.000208516, 1.2428522109985352e-11, 0.008410159, 0] [13, 0.000753869, 1.1233523488044739e-11, 0.031850126, 0] [14, 0.002861461, 1.0659772902727127e-11, 0.123932474, 0] [15, 0.011584471, 1.0788879357278348e-11, 0.489492226, 0] [16, 0.050115724, 1.1668476276099682e-11, 1.943925784, 0] [17, 0.198430008, 1.1550146620720624e-11, 7.75830141, 0] [18, 0.81089126, 1.1800020874943584e-11, 30.958098418, 0] [19, 3.451110935, 1.2555068442452466e-11, 123.754608623, 0] [20, 13.798294714, 1.2549475935884403e-11, 494.82605454, 0] [21, 55.298075349, 1.2573326637038918e-11, 1979.311856489, 0]