Bad regexp hangs ruby
|ruby -v:||ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]||Backport:||2.0.0: UNKNOWN, 2.1: UNKNOWN|
Here is an extracted problem i ran into recently:
$ cat test.rb
str = ('a' * ARGV.to_i) + '?'
re = /(\w*)*$/
On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
$ time RBENV_VERSION=2.1.1 ruby test.rb 14
real 1.392 user 1.364 sys 0.026 pcpu 99.83
$ time RBENV_VERSION=2.1.1 ruby test.rb 15
real 3.979 user 3.949 sys 0.026 pcpu 99.89
$ time RBENV_VERSION=2.1.1 ruby test.rb 16
real 11.995 user 11.954 sys 0.031 pcpu 99.92
Ruby versions 1.9.3 and 2.0 behave similarly.
I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues
#1 Updated by Rafael França about 1 year ago
I tried to reproduce with your script and could not:
$ ruby -v ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0] $ cat foo.rb str = ('a' * ARGV.to_i) + '?' re = /(\w)$/ re.match(str) $ time ruby foo.rb 14 real 0m0.134s user 0m0.075s sys 0m0.050s $ time ruby foo.rb 15 real 0m0.145s user 0m0.082s sys 0m0.054s $ time ruby foo.rb 16 real 0m0.137s user 0m0.076s sys 0m0.052s $ time ruby foo.rb 50 real 0m0.142s user 0m0.080s sys 0m0.053s
What is the environment you are running?
#2 Updated by Gergo Erdosi about 1 year ago
For some reason the regex is not displayed correctly. I'm able to reproduce the reported issue (see the correct regex below):
$ cat test.rb str = ('a' * ARGV.to_i) + '?' re = /(\w*)*$/ re.match(str) $ time ruby test.rb 14 real 0m1.179s user 0m1.128s sys 0m0.004s $ time ruby test.rb 15 real 0m3.568s user 0m3.419s sys 0m0.020s $ time ruby test.rb 16 real 0m10.767s user 0m10.276s sys 0m0.067s
#3 Updated by Shyouhei Urabe about 1 year ago
Ruby's regexp engine is NP-complete. It's ultimately impossible to guarantee
regexp matches to run fast (if you don't think so please send us a proof). It
might be possible to warn your specific bad regexp, but in general it's also
impossible to tell which regexp is bad and which isn't. That's the Halintg
problem http://en.wikipedia.org/wiki/Halting_problem .
So, in short, there is (at least believed to be) no ultimate solution. All what
might be possible is to find a reasonable compromise. For instance python does
not detect that shorter one:
zsh % python Python 2.7.5+ (default, Feb 27 2014, 19:37:08) [GCC 4.8.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.compile(r'(\w*)*$') <_sre.SRE_Pattern object at 0x1dee030> >>>