Bug #14418: ruby 2.5 slow regexp execution - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14418

closed

ruby 2.5 slow regexp execution

Bug #14418: ruby 2.5 slow regexp execution

Added by jakub.wozny (Kuba W) almost 8 years ago. Updated over 2 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

2.5

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN

[ruby-core:85219]

Tags:

regexp, perf

Description

I have simple regexp that performing very slow.

"fußball "*20 =~ /^([\S\s]{1000})/i

It works fast if I remove /i flag. I figured out that is also depends on string length or on quantifier value (in this case it is {1000}).
When you remove ß form the string it also works fast.

I tested on 2.3.1, 2.4.3 and 2.5.0.

I'm not sure it is a bug or it just works that way.

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#1 [ruby-core:85222]

I can't paste the code here corectly. I creted a gist with regexp: https://gist.github.com/kubaw/60ca998200d80883156fa94efa7eb6fe

Updated by sos4nt (Stefan Schüßler) almost 8 years ago Actions
Copy link
#2 [ruby-core:85228]

I can't paste the code here corectly.

You have to insert a blank line before ~~~

Updated by shevegen (Robert A. Heiler) almost 8 years ago Actions
Copy link
#3 [ruby-core:85229]

You have to insert a blank line before

I also often just insert four ' ' space characters before the code
I want to add; no idea if it is correctly interpreted but it seems
to work on both github and ruby-lang.org, so I tend to use it. :D

To the regexp performance, I have no idea if it is a bug or not,
but I think either way, it may be helpful to have some test code
that can test different regexps and correlate it with the "expected
speed outcome". That way issue requests like this could help people
before they report a (potential) issue, to see whether everything
works as-is or some kind of bug exists.

Since you use "ß", let me ask you - what encoding do you use within
the script? Possibly UTF-8? Have you tested if some ISO-encoding
makes a difference in regards to speed?

Reason I ask mostly is because I assume you output german text and
the german umlauts are one huge reason for me to prefer ISO encoding
(due to it being simpler for me to handle with it in a project, as
opposed to Unicode variants).

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#4 [ruby-core:85232]

Ok, Blow is the regexp that I tested. I used utf-8 encodnings at the begining:

"fußball "*20 =~ /([\S\s]{1000})/i

Some measurements:

 (0..20).each { |n| puts Benchmark.measure { "fußball "*n =~ /^([\S\s]{1000})/i } }
  0.000000   0.000000   0.000000 (  0.000481)
  0.000000   0.000000   0.000000 (  0.000079)
  0.000000   0.000000   0.000000 (  0.000246)
  0.000000   0.000000   0.000000 (  0.000751)
  0.010000   0.000000   0.010000 (  0.002447)
  0.000000   0.000000   0.000000 (  0.006554)
  0.010000   0.000000   0.010000 (  0.007416)
  0.020000   0.000000   0.020000 (  0.022623)
  0.070000   0.000000   0.070000 (  0.066888)
  0.200000   0.000000   0.200000 (  0.196393)
  0.590000   0.000000   0.590000 (  0.591980)
  1.770000   0.000000   1.770000 (  1.772828)
  5.290000   0.010000   5.300000 (  5.292948)
 15.860000   0.000000  15.860000 ( 15.868370)

I would expect that this code should work as fast as version without /i flag.

"fußball "*20 =~ /([\S\s]{1000})/

(0..20).each { |n| puts Benchmark.measure { "fußball "*n =~ /^([\S\s]{1000})/ } }
  0.000000   0.000000   0.000000 (  0.000036)
  0.000000   0.000000   0.000000 (  0.000009)
  0.000000   0.000000   0.000000 (  0.000011)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000018)
  0.000000   0.000000   0.000000 (  0.000029)
  0.000000   0.000000   0.000000 (  0.000020)
  0.000000   0.000000   0.000000 (  0.000021)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000027)
  0.000000   0.000000   0.000000 (  0.000022)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000025)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000053)

Another test cases:

Benchmark.measure { "ß "*20 =~ /^([\S\s]{20})/i } # 0.000000   0.000000   0.000000 (  0.000431)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{30})/i } # 0.000000   0.000000   0.000000 (  0.000427)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{40})/i } # 0.000000   0.000000   0.000000 (  0.000430)
Benchmark.measure { "ß "*20 =~ /^([\S\s]{50})/i } # too long to wait

#without /i flag:
Benchmark.measure { "ß "*20 =~ /^([\S\s]{50})/ } #0.000000   0.000000   0.000000 (  0.000043)

I tested in other encodings:

Benchmark.measure{("fußball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/i}.to_s # => "  3.450000   0.000000   3.450000 (  3.452036)\n"

In case of other encoding, removing /i also speeds up:

Benchmark.measure{("fußball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/}.to_s #=> "  0.010000   0.000000   0.010000 (  0.000514)\n"

Reason I ask mostly is because I assume you output german text and
the german umlauts are one huge reason for me to prefer ISO encoding
(due to it being simpler for me to handle with it in a project, as
opposed to Unicode variants).

I have multilingual app so I need to stay in unicode.

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#5 [ruby-core:85245]

Description updated (diff)

FYI, you can avoid it by using . instead of [\S\s].

Updated by duerst (Martin Dürst) almost 8 years ago Actions
Copy link
#6 [ruby-core:85248]

What happens essentially when using //i is that every 'ß' in the string (and in the regular expression) is expanded to 'ss', dynamically. For [\S\s], this wouldn't be necessary. But all character classes are internally treated the same way, so it still happens.

Updated by hsbt (Hiroshi SHIBATA) almost 6 years ago Actions
Copy link
#7

Tags set to regexp, perf

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago Actions
Copy link
#8 [ruby-core:114468]

Status changed from Open to Closed

Thanks to very impressive work by @makenowjust, this issue has been fixed in Ruby 3.2.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14418

ruby 2.5 slow regexp execution

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#1 [ruby-core:85222]

Updated by sos4nt (Stefan Schüßler) almost 8 years ago Actions
Copy link
#2 [ruby-core:85228]

Updated by shevegen (Robert A. Heiler) almost 8 years ago Actions
Copy link
#3 [ruby-core:85229]

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#4 [ruby-core:85232]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#5 [ruby-core:85245]

Updated by duerst (Martin Dürst) almost 8 years ago Actions
Copy link
#6 [ruby-core:85248]

Updated by hsbt (Hiroshi SHIBATA) almost 6 years ago Actions
Copy link
#7

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago Actions
Copy link
#8 [ruby-core:114468]

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14418

ruby 2.5 slow regexp execution

Updated by jakub.wozny (Kuba W) almost 8 years ago ActionsCopy link #1 [ruby-core:85222]

Updated by sos4nt (Stefan Schüßler) almost 8 years ago ActionsCopy link #2 [ruby-core:85228]

Updated by shevegen (Robert A. Heiler) almost 8 years ago ActionsCopy link #3 [ruby-core:85229]

Updated by jakub.wozny (Kuba W) almost 8 years ago ActionsCopy link #4 [ruby-core:85232]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago ActionsCopy link #5 [ruby-core:85245]

Updated by duerst (Martin Dürst) almost 8 years ago ActionsCopy link #6 [ruby-core:85248]

Updated by hsbt (Hiroshi SHIBATA) almost 6 years ago ActionsCopy link #7

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago ActionsCopy link #8 [ruby-core:114468]

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#1 [ruby-core:85222]

Updated by sos4nt (Stefan Schüßler) almost 8 years ago Actions
Copy link
#2 [ruby-core:85228]

Updated by shevegen (Robert A. Heiler) almost 8 years ago Actions
Copy link
#3 [ruby-core:85229]

Updated by jakub.wozny (Kuba W) almost 8 years ago Actions
Copy link
#4 [ruby-core:85232]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#5 [ruby-core:85245]

Updated by duerst (Martin Dürst) almost 8 years ago Actions
Copy link
#6 [ruby-core:85248]

Updated by hsbt (Hiroshi SHIBATA) almost 6 years ago Actions
Copy link
#7

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago Actions
Copy link
#8 [ruby-core:114468]