Project

General

Profile

Actions

Bug #9694

closed

Bad regexp hangs ruby

Added by mxposed (Nikolay Markov) over 10 years ago. Updated over 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
[ruby-core:61810]

Description

Here is an extracted problem i ran into recently:

$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:

$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92

Ruby versions 1.9.3 and 2.0 behave similarly.

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-:])/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues

Updated by rafaelfranca (Rafael França) over 10 years ago

I tried to reproduce with your script and could not:

$ ruby -v
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]

$ cat foo.rb
str = ('a' * ARGV[0].to_i) + '?'
re = /(\w)$/
re.match(str)

$ time ruby foo.rb 14

real	0m0.134s
user	0m0.075s
sys	0m0.050s

$ time ruby foo.rb 15

real	0m0.145s
user	0m0.082s
sys	0m0.054s

$ time ruby foo.rb 16

real	0m0.137s
user	0m0.076s
sys	0m0.052s

$ time ruby foo.rb 50

real	0m0.142s
user	0m0.080s
sys	0m0.053s

What is the environment you are running?

Actions #2

Updated by gergoerdosi (Gergo Erdosi) over 10 years ago

For some reason the regex is not displayed correctly. I'm able to reproduce the reported issue (see the correct regex below):

$ cat test.rb 
str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
$ time ruby test.rb 14

real	0m1.179s
user	0m1.128s
sys	0m0.004s
$ time ruby test.rb 15

real	0m3.568s
user	0m3.419s
sys	0m0.020s
$ time ruby test.rb 16

real	0m10.767s
user	0m10.276s
sys	0m0.067s

Updated by shyouhei (Shyouhei Urabe) over 10 years ago

Ruby's regexp engine is NP-complete. It's ultimately impossible to guarantee
regexp matches to run fast (if you don't think so please send us a proof). It
might be possible to warn your specific bad regexp, but in general it's also
impossible to tell which regexp is bad and which isn't. That's the Halintg
problem http://en.wikipedia.org/wiki/Halting_problem .

So, in short, there is (at least believed to be) no ultimate solution. All what
might be possible is to find a reasonable compromise. For instance python does
not detect that shorter one:

zsh % python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r'(\w*)*$')
<_sre.SRE_Pattern object at 0x1dee030>
>>> 

Updated by mxposed (Nikolay Markov) over 10 years ago

Rafael, i'm sorry, the bad regexp is not displaying properly, something is obviously wrong with my formatting. Gergo reproduced it same as i have it.

Urabe, do you know how Perl does that? Also, i'll be grateful for the link to regexp sources in ruby

Updated by shyouhei (Shyouhei Urabe) over 10 years ago

Nikolay Markov wrote:

Urabe, do you know how Perl does that? Also, i'll be grateful for the link to regexp sources in ruby

Don't know anything about perl's. Ruby's regexp engine is called Onigmo: https://github.com/k-takata/Onigmo

Actions #6

Updated by jeremyevans0 (Jeremy Evans) over 5 years ago

  • Status changed from Open to Closed
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0