On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
$ time RBENV_VERSION=2.1.1 ruby test.rb 14
real 1.392 user 1.364 sys 0.026 pcpu 99.83
$ time RBENV_VERSION=2.1.1 ruby test.rb 15
real 3.979 user 3.949 sys 0.026 pcpu 99.89
$ time RBENV_VERSION=2.1.1 ruby test.rb 16
real 11.995 user 11.954 sys 0.031 pcpu 99.92
Ruby versions 1.9.3 and 2.0 behave similarly.
I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-:])/?$/
I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues
For some reason the regex is not displayed correctly. I'm able to reproduce the reported issue (see the correct regex below):
$ cat test.rb
str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
$ time ruby test.rb 14
real 0m1.179s
user 0m1.128s
sys 0m0.004s
$ time ruby test.rb 15
real 0m3.568s
user 0m3.419s
sys 0m0.020s
$ time ruby test.rb 16
real 0m10.767s
user 0m10.276s
sys 0m0.067s
Ruby's regexp engine is NP-complete. It's ultimately impossible to guarantee
regexp matches to run fast (if you don't think so please send us a proof). It
might be possible to warn your specific bad regexp, but in general it's also
impossible to tell which regexp is bad and which isn't. That's the Halintg
problem http://en.wikipedia.org/wiki/Halting_problem .
So, in short, there is (at least believed to be) no ultimate solution. All what
might be possible is to find a reasonable compromise. For instance python does
not detect that shorter one: