Specific combination of regexp and string causes 100% CPU and doesn't recover
Specific combination of regexp and string can cause ruby process to hang with 100% CPU.
Reproducing (in irb):
/\A(?:%\h\h|[%]+)*\z/ =~ "199542328.1312293792.1.1.utmcsr%3Dgoogle%7Cutmccn%"
(above hangs indefinably with 100% cpu)
/\A(?:%\h\h|[%]+)*\z/ =~ "199542328.1312293792.1.1.utmcsr%3Dgoogle%7Cutmccn"
(same but without % at the end returns succesfully)
The code in question is found in Rack:Utils (v1.3.2, not used in v1.2.1) and can basically "kill" any server process (happened to us in production on a thin machine after we upgraded to newer rack). The above bug means that it is very easy to perform DoS on affected ruby server.
Updated by regularfry (Alex Young) about 8 years ago
I'd disagree with the location of this bug. I've had a quick look, and while this doesn't look like a Ruby bug, perhaps it ought to be. The regex as given:
does not appear in Rack, but does appear in lib/ruby/1.9.1/uri/common.rb (line 778 in -p290). Rack has this:
This would not appear to suffer from the same exponential behaviour as that in URI, while apparently validating the same strings. Perhaps the appropriate substitution should be made in uri/common.rb? Patch untested, but "looks right".
Updated by matmarex (Bartosz Dz) about 8 years ago
No, this is a buggy regex - a case of catastrophic backtracking. http://www.regular-expressions.info/catastrophic.html
Removing the "+" after [%] fixes it.
This is because both this "+" is greedy and the "*" at the end are greedy, so Ruby tries to match as many "[%]"s as possible, and then to match the result as many times as possible; obviously it fails (since the next character is the percent sign), then it backtracks to one less character, and tries to match this; then again, and again. Number of repetitions skyrockets and boom, everything hangs while Ruby tries hard to backtrack and backtrack.