Feature #5248
Faster PStore
| Status: | Closed | Start date: | 08/29/2011 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | lib | |||
| Target version: | 2.0.0 |
Description
Hellow.
I wrote a patch to make PStore more faster. What I did as follows:
- deferred check sum calculation
-
PStore judges whether it should modify database file or not by 2 steps. First, it compares data size between database file and marshal data to write. If both size are different, it writes data to database file. Second, if both sizes are same, it compares Digest::MD5.digest of both still more.
However, PStore calculates a check sum of data to write before size comparison. If PStore can judge it should modify database file in size comparison, this calculation will be useless. Consequently, I modified PStore to calculate a check sum after size comparison.
- check-sum calculation by String#sum
- As stated above, Present PStore calculates checksum of database file by Digest::MD5.digest to judge whether it should modify file or not. However, Digest::MD5.digest is cryptographic hash function and I think it is too strong to use as a mere check sum. Therefore, I modified PStore to use not Digest::MD5.digest but String#sum.
- deferred File#truncate
-
PStore puts back file pointer to the head and truncates file size to zero before writing to database file as below.
file.rewind file.truncate(0) file.write(data) (pstore.rb at line 486~488)
However, truncation by File#truncate is slow and it is the bottleneck of PStore. I modified it as below.
file.rewind file.write(data) file.truncate(data.size)
It only puts back file pointer before write. Truncation is done after writing. In this way, size needs to be truncate will be minimum and it makes PStore faster.
- performance
-
I benchmarked PStore as below:
require 'pstore' p = PStore.new("foo") p.transaction { p["hoge"] = "hoge" * ARGV.first.to_i } 10000.times do p.transaction { p["hoge"] += "hoge" } endPresent PStore:
% time ruby pstore_bench.rb 1000 ruby pstore_bench.rb 1000 2.94s user 2.43s system 69% cpu 7.723 total % time ruby pstore_bench.rb 10000 ruby pstore_bench.rb 10000 5.37s user 2.99s system 70% cpu 11.810 total % time ruby pstore_bench.rb 100000 ruby pstore_bench.rb 100000 31.98s user 11.09s system 69% cpu 1:02.15 total
New PStore:
% time ruby pstore_bench.rb 1000 ruby pstore_bench.rb 1000 1.67s user 0.44s system 99% cpu 2.119 total % time ruby pstore_bench.rb 10000 ruby pstore_bench.rb 10000 3.24s user 0.63s system 99% cpu 3.876 total % time ruby pstore_bench.rb 100000 ruby pstore_bench.rb 100000 14.29s user 3.13s system 100% cpu 17.416 total
As a result, new PStore is faster. It can be said that new PStore is the faster, the bigger database file is. I attached a patch. PStore applied the patch passes test-all.
Associated revisions
History
Updated by naruse (Yui NARUSE) 9 months ago
Updated by mrkn (Kenta Murata) 9 months ago
Updated by naruse (Yui NARUSE) 9 months ago
Updated by Glass_saga (Masaki Matsushita) 9 months ago
Updated by Glass_saga (Masaki Matsushita) 5 months ago
- File patch.diff added
Updated by nahi (Hiroshi Nakamura) 5 months ago
Updated by Glass_saga (Masaki Matsushita) 5 months ago
- File perf.diff added
- File cosmetic.diff added
Updated by nahi (Hiroshi Nakamura) 5 months ago
Thanks! perf.diff looks OK to me. I'm evaluating this. cosmetic.diff would be handled later.
require 'pstore'
require 'benchmark'
def run(size)
file = "pstore_#{size}"
File.unlink(file) if File.exist?(file)
store = PStore.new(file)
store.transaction do
store["hoge"] = "hoge" * size
end
1000.times do
store.transaction do
store["hoge"] += "hoge"
end
end
end
Benchmark.bm(6) do |bm|
[1000, 10000, 100000].each do |size|
bm.report(size.to_s) { run(size) }
end
endSo far, file truncation tweak is the key.
Original:
# user system total real 1000 0.120000 0.230000 0.350000 ( 0.481725) 10000 0.260000 0.350000 0.610000 ( 0.705170) 100000 3.500000 1.840000 5.340000 ( 18.753409)
With truncation tweak:
# user system total real 1000 0.080000 0.030000 0.110000 ( 0.110230) 10000 0.300000 0.010000 0.310000 ( 0.307720) 100000 2.230000 0.480000 2.710000 ( 2.715860)
With deferred MD5 calculation:
# user system total real 1000 0.050000 0.230000 0.280000 ( 0.417151) 10000 0.120000 0.410000 0.530000 ( 0.804206) 100000 2.460000 1.750000 4.210000 ( 18.411857)
With both (full patch):
# user system total real 1000 0.070000 0.040000 0.110000 ( 0.112234) 10000 0.180000 0.050000 0.230000 ( 0.235963) 100000 1.490000 0.480000 1.970000 ( 1.963404)
I'll merge it tonight.
Updated by sorah (Shota Fukumori) 5 months ago
- Assignee set to nahi (Hiroshi Nakamura)
Updated by sorah (Shota Fukumori) 5 months ago
- Status changed from Open to Assigned
Updated by nahi (Hiroshi Nakamura) 5 months ago
- Status changed from Assigned to Closed