Bug #9221

Time.parse performance becomes exponentially worse as string length grows

Added by Michael Pelz-Sherman 5 months ago. Updated 3 months ago.

[ruby-core:58876]
Status:Closed
Priority:Normal
Assignee:tadayoshi funaba
Category:core
Target version:2.1.0
ruby -v:ruby-2.0.0-p247 Backport:1.9.3: DONE, 2.0.0: DONE

Description

See attached script. Output:

parsing 12:00 PM fffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.010443
parsing 12:00 PM ffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.017739
parsing 12:00 PM fffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.028127
parsing 12:00 PM ffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.049885
parsing 12:00 PM fffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.071379
parsing 12:00 PM ffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.112612
parsing 12:00 PM fffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.184517
parsing 12:00 PM ffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.293784
parsing 12:00 PM fffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.476253
parsing 12:00 PM ffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.786087
parsing 12:00 PM fffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 1.256976
parsing 12:00 PM ffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 2.019426
parsing 12:00 PM fffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 3.300646
parsing 12:00 PM ffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 5.37757
parsing 12:00 PM fffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 8.763601
parsing 12:00 PM ffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 14.165842
parsing 12:00 PM fffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 23.888907
...

time_parse.rb Magnifier - script to demonstrate Time.parse performance degredation (237 Bytes) Michael Pelz-Sherman, 12/05/2013 10:53 PM

Associated revisions

Revision 44086
Added by Nobuyoshi Nakada 5 months ago

date_parse.c: get rid of backtrack explosion

  • ext/date/dateparse.c (parsetime): unset case-insensitive flag for [:alpha:], which already implies both cases, to get rid of backtrack explosion. [Bug #9221]

Revision 44126
Added by Nobuyoshi Nakada 4 months ago

date_strptime.c: get rid of backtrack explosion

  • ext/date/datestrptime.c (datestrptimeinternal): unset case-insensitive flag for [:alpha:], which already implies both cases, to get rid of backtrack explosion. [Bug #9221]

History

#1 Updated by Martin Dürst 5 months ago

This runs in no time on ruby 2.0.0p247 (2013-06-27) i386-mingw32. Can you give more details about your environment?

#2 Updated by Michael Pelz-Sherman 5 months ago

duerst (Martin Dürst) wrote:

This runs in no time on ruby 2.0.0p247 (2013-06-27) i386-mingw32. Can you give more details about your environment?

Model Name: MacBook Pro
Model Identifier: MacBookPro9,1
Processor Name: Intel Core i7
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP91.00D3.B08
SMC Version (system): 2.1f173
Serial Number (system): C02JP19HF1G3

#3 Updated by Michael Pelz-Sherman 5 months ago

Running ruby-2.0.0-p247 [ x86_64 ], installed using RVM.

#4 Updated by Charlie Somerville 5 months ago

Reproduced on trunk:

irb(main):027:0> RUBYDESCRIPTION
=> "ruby 2.1.0dev (2013-12-07 trunk 44044) [x86
64-darwin13.0]"
irb(main):028:0> Benchmark.measure { Time.parse("12:00 PM " + "f"30) }.to_s
=> " 4.760000 0.000000 4.760000 ( 4.759983)\n"
irb(main):029:0> Benchmark.measure { Time.parse("12:00 PM " + "f"
31) }.tos
=> " 7.770000 0.000000 7.770000 ( 7.787989)\n"
irb(main):030:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*32) }.to
s
=> " 12.390000 0.000000 12.390000 ( 12.405786)\n"

#5 Updated by Hiroshi SHIBATA 5 months ago

  • Target version set to 2.1.0

#6 Updated by Heesob Park 5 months ago

It can be reproduced on ruby 2.0.0-p247 i386-mingw32.

I modified a test code to extension library free version.

C:\work>ruby -v
ruby 2.0.0p247 (2013-06-27) [i386-mingw32]

C:\work>type test.rb
require 'benchmark'
str = "12:00 PM fffffffffffffffffffffffffffff"
pat=/((?:\d+\s:\s\d+(?:\s:\s\d+(?:[,.]\d)?)?|\d+\sh(?:\s\d+m?(?:\s\d+s?)
?)?)(?:\sap)?|\d+\sap)(?:\s*((?:gmt|utc?)?[-+]\d
+(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|:alpha:
+(?:\sdst)?\b))?/i
p Benchmark.measure { pat.match(str) }

C:\work>ruby test.rb
#

Interestingly, this issue is not reproduced on irb session or without filename.

C:\work>ruby < test.rb
#

C:\work>irb
DL is deprecated, please use Fiddle
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> str = "12:00 PM fffffffffffffffffffffffffffff"
=> "12:00 PM fffffffffffffffffffffffffffff"
<:].\s]+(?:standard|daylight)\stime\b|:alpha:+(?:\sdst)?\b))?/i
=> /((?:\d+\s:\s\d+(?:\s:\s\d+(?:[,.]\d)?)?|\d+\sh(?:\s\d+m?(?:\s\d+s?)?
)?)(?:\sap)?|\d+\sap)(?:\s*((?:gmt|utc?)?[-+]\d+
(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|:alpha:+
(?:\sdst)?\b))?/i
irb(main):004:0> p Benchmark.measure { pat.match(str) }
#
=> #

#7 Updated by Akira Tanaka 5 months ago

  • Status changed from Open to Assigned
  • Assignee set to tadayoshi funaba

I assigned this issue to tadf because the pattern
is written in ext/date/date_parse.c.

#8 Updated by Heesob Park 4 months ago

I think this is an issue of Oniguruma(regular expressions library).

Here is a shortest code of producing this issue.
p /(?::alpha:+\s)?/i.match("f"*40)

The [:alpha:] bracket and case ignore flag combination shows bad performance.

The simple workaround is using [a-z] instead of [:alpha:].

#9 Updated by Nobuyoshi Nakada 4 months ago

:alpha: doesn't equal to [a-z], the former matches unicode alphabet categories.

#10 Updated by Nobuyoshi Nakada 4 months ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r44086.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


date_parse.c: get rid of backtrack explosion

  • ext/date/dateparse.c (parsetime): unset case-insensitive flag for [:alpha:], which already implies both cases, to get rid of backtrack explosion. [Bug #9221]

#11 Updated by Heesob Park 4 months ago

I know :alpha: is not equal to [a-z].
But, it seems that the current time parsing is only for ascii string.

BTW, date_strptime.c has the same regular expression on line 567-572.
It should be modified also.

#12 Updated by Tomoyuki Chikanaga 4 months ago

  • Status changed from Closed to Open
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED

Thank you for your notice. I'll re-open this ticket.
And I confirmed ruby193 and ruby200 have these regular expressions.

#14 Updated by Nobuyoshi Nakada 4 months ago

  • Status changed from Open to Closed

This issue was solved with changeset r44126.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


date_strptime.c: get rid of backtrack explosion

  • ext/date/datestrptime.c (datestrptimeinternal): unset case-insensitive flag for [:alpha:], which already implies both cases, to get rid of backtrack explosion. [Bug #9221]

#15 Updated by Tomoyuki Chikanaga 4 months ago

  • Backport changed from 1.9.3: REQUIRED, 2.0.0: REQUIRED to 1.9.3: REQUIRED, 2.0.0: DONE

r44086 and r44126 are backported to ruby20_0 branch at r44126.

#16 Updated by Usaku NAKAMURA 3 months ago

  • Backport changed from 1.9.3: REQUIRED, 2.0.0: DONE to 1.9.3: DONE, 2.0.0: DONE

backported into ruby19_3 at r44746.

Also available in: Atom PDF