Bug #9695

Substring search exhibit quadratic behaviour.

Added by Linus Sellberg 12 months ago. Updated 12 months ago.

[ruby-core:61811]
Status:Rejected
Priority:Normal
Assignee:-
ruby -v:ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux] Backport:2.0.0: UNKNOWN, 2.1: UNKNOWN

Description

http://nelhagedebugsshit.tumblr.com/post/81301884746/surveying-various-languages-string-search-algorithms

12.upto(21).map do |i|
  needle='a'*(2**(i-1)) + 'b'
  haystack = 'a' * (2**i)
  a = Time.now; haystack.include?(needle); b=Time.now
  t = b-a
  p [t, t/haystack.length**2]
end

gives

[0.000237363, 1.4147937297821046e-11]
[0.000851294, 1.2685269117355347e-11]
[0.003081808, 1.1480629444122315e-11]
[0.013984239, 1.3023837469518185e-11]
[0.035659327, 8.302584057673811e-12]
[0.136899401, 7.968593912664801e-12]
[0.545703233, 7.941027186461724e-12]
[2.217092242, 8.065734589763452e-12]
[8.965134534, 8.15374235935451e-12]
[36.113581501, 8.211277759301083e-12]
=> 

Wasn't me that found it but it seems the founder hasn't reported it yet.

History

#1 Updated by Shyouhei Urabe 12 months ago

  • Category set to core

Interesting, and I'm pretty sure this also interests matz.

#2 Updated by Nobuyoshi Nakada 12 months ago

  • Description updated (diff)

It's O(m*n), not O(m**2), where m and n are the lengths of the receiver and the searching strings respectively.

#3 Updated by Yui NARUSE 12 months ago

  • Status changed from Open to Rejected

The algorithm is Sunday's quick search, which is a variation of BM, whose cost is O(m*n) as nobu said.

The benchmark happens the worst case for BM variants.

#4 Updated by Yui NARUSE 12 months ago

For example other BM implementation, I compared with Ruby's regexp:

12.upto(21).map do |i|
  needle='a'*(2**(i-1)) + 'b'
  haystack = 'a' * (2**i)
  a = Time.now; haystack.include?(needle); b=Time.now
  t1 = b-a
  a = Time.now; Regexp.new(needle) =~ haystack; b=Time.now
  t2 = b-a
  p [i, t1, t1/haystack.length**2, t2, 21/haystack.length**2]
end
[12, 0.000208516, 1.2428522109985352e-11, 0.008410159, 0]
[13, 0.000753869, 1.1233523488044739e-11, 0.031850126, 0]
[14, 0.002861461, 1.0659772902727127e-11, 0.123932474, 0]
[15, 0.011584471, 1.0788879357278348e-11, 0.489492226, 0]
[16, 0.050115724, 1.1668476276099682e-11, 1.943925784, 0]
[17, 0.198430008, 1.1550146620720624e-11, 7.75830141, 0]
[18, 0.81089126, 1.1800020874943584e-11, 30.958098418, 0]
[19, 3.451110935, 1.2555068442452466e-11, 123.754608623, 0]
[20, 13.798294714, 1.2549475935884403e-11, 494.82605454, 0]
[21, 55.298075349, 1.2573326637038918e-11, 1979.311856489, 0]

Also available in: Atom PDF