Bug #12373
closedOptimize CSV#shift
Description
I think that str1.start_with?(str2)
is faster than str1[0] == str2
.
Because str1.start_with?(str2)
just call String#start_with?
, But str1[0] == str2
call String#[]
, make new String
object and call String#==
.
(The patch is csv-shift.patch)
Benchmark results.
csv-benchmark.rb make temp CSV file and call CSV#each
method(inner call CSV#shift
)
$ ruby csv-benchmark.rb
Warming up --------------------------------------
old_csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
old_csv_shift 0.444 (± 0.0%) i/s - 3.000 in 6.759200s
new_csv_shift 0.479 (± 0.0%) i/s - 3.000 in 6.264069s
Comparison:
new_csv_shift: 0.5 i/s
old_csv_shift: 0.4 i/s - 1.08x slower
string-start_with.rb is a micro benchmark for str1[0] == str2
and str1.start_with?(str2)
$ ruby string-start_with.rb
Warming up --------------------------------------
a[0] == b 90.881k i/100ms
a.start_with?(b) 115.557k i/100ms
Calculating -------------------------------------
a[0] == b 1.836M (± 3.8%) i/s - 9.179M in 5.006568s
a.start_with?(b) 3.183M (± 4.2%) i/s - 15.947M in 5.018654s
Comparison:
a.start_with?(b): 3183386.0 i/s
a[0] == b: 1836263.5 i/s - 1.73x slower
Of course $ make test-all TESTS="test/csv/*" passed
Files
Updated by ksss (Yuki Kurihara) over 8 years ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
- Description updated (diff)
You can replace str1[-1] == str2
with str1.end_with?(str2)
too, three places.
Updated by ksss (Yuki Kurihara) over 8 years ago
Thank you for your reply.
You can replace str1[-1] == str2 with str1.end_with?(str2) too, three places.
Yes, That is my next plan.
I'm writing benchmark carefully.
Should I collect patches in this issue?
Updated by ksss (Yuki Kurihara) over 8 years ago
- File csv-shift-1.patch csv-shift-1.patch added
- File csv-shift-2.patch csv-shift-2.patch added
- File csv-shift-3.patch csv-shift-3.patch added
I update patches.
csv-shift-1.patch: Use s1.start_with?(s2) instead of s1[0] == s2 and use s1.end_with?(s2) instead of s1[-1] == s2.
$ ruby csv-benchmark.rb
Warming up --------------------------------------
old_csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
old_csv_shift 0.417 (± 0.0%) i/s - 3.000 in 7.189555s
new_csv_shift 0.493 (± 0.0%) i/s - 3.000 in 6.080267s
Comparison:
new_csv_shift: 0.5 i/s
old_csv_shift: 0.4 i/s - 1.18x slower
csv-shift-2.patch: Use @double_quote_char instead of @quote_char * 2.
$ ruby csv-benchmark.rb
Warming up --------------------------------------
old_csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
old_csv_shift 0.443 (± 0.0%) i/s - 3.000 in 6.772626s
new_csv_shift 0.486 (± 0.0%) i/s - 3.000 in 6.167382s
Comparison:
new_csv_shift: 0.5 i/s
old_csv_shift: 0.4 i/s - 1.10x slower
csv-shift-3.patch: Apply both patch csv-shift-1.patch and csv-shift-2.patch.
$ ruby csv-benchmark.rb
Warming up --------------------------------------
old_csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
old_csv_shift 0.436 (± 0.0%) i/s - 3.000 in 6.875614s
new_csv_shift 0.567 (± 0.0%) i/s - 3.000 in 5.292932s
Comparison:
new_csv_shift: 0.6 i/s
old_csv_shift: 0.4 i/s - 1.30x slower
Updated by ksss (Yuki Kurihara) over 8 years ago
- Assignee set to JEG2 (James Gray)
Updated by hsbt (Hiroshi SHIBATA) over 8 years ago
- Status changed from Open to Assigned
Updated by hsbt (Hiroshi SHIBATA) over 7 years ago
- Assignee changed from JEG2 (James Gray) to hsbt (Hiroshi SHIBATA)
Updated by hsbt (Hiroshi SHIBATA) over 7 years ago
- File csv-shift-hsbt.path csv-shift-hsbt.path added
- Target version set to 2.5
@ksss
I confirmed your patch and benchmark results.
Warming up --------------------------------------
csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
csv_shift 1.192 (± 0.0%) i/s - 6.000 in 5.034250s
new_csv_shift 1.527 (± 0.0%) i/s - 8.000 in 5.243446s
Comparison:
new_csv_shift: 1.5 i/s
csv_shift: 1.2 i/s - 1.28x slower
Should we use @double_quote_char
instead of @quote_char
?
your patch named csv-shift-3.patch is only assign @double_quote_char
. It's not used in csv.rb
I update replacement it in csv.rb and attched csv-shift-hsbt.patch.
Can you confirm it? If it's ok, I will merge it.
Updated by ksss (Yuki Kurihara) over 7 years ago
Thank you for responding.
csv-shift-hsbt.patch looks good to me.
Should we use @double_quote_char instead of @quote_char ?
your patch named csv-shift-3.patch is only assign @double_quote_char. It's not used in csv.rb
@double_quote_char and @quote_char are different values. It can not be used instead.
@double_quote_char looks like using in CSV#shift method on csv-shift-3.patch and csv-shift-hsbt.patch.
Updated by hsbt (Hiroshi SHIBATA) over 7 years ago
Ah,
Should we use @double_quote_char instead of @quote_char ?
@quote_char
is wrong. I intended to write @quote_char * 2
And, I confirmed csv-shift-3.patch again. It has no problem for me.
Thank you for your explanation.
(In Japanese: csv-shift-3.patch を確認した時に @double_quote_char 使ってないのでは...? と思い込んでいたのですが、今確認したところちゃんと置き換わっていました。失礼しました。)
Updated by ksss (Yuki Kurihara) over 7 years ago
Good, Thank you for confirming and managing.
Updated by hsbt (Hiroshi SHIBATA) over 7 years ago
- Status changed from Assigned to Closed
Applied in changeset trunk|r58770.
Optimize CSV#shift.
[Bug #12373][ruby-core:75462]
Patch by Yuki Kurihara.
Benchmark:
Warming up --------------------------------------
csv_shift 1.000 i/100ms
new_csv_shift 1.000 i/100ms
Calculating -------------------------------------
csv_shift 1.192 (± 0.0%) i/s - 6.000 in 5.034250s
new_csv_shift 1.527 (± 0.0%) i/s - 8.000 in 5.243446s
Comparison:
new_csv_shift: 1.5 i/s
csv_shift: 1.2 i/s - 1.28x slower