Feature #14146
closedImprove performance of creating Hash object
Description
When generate Hash object, the heap area of st_table will be always allocated in internally
and seems it take a time.
To improve performance of creating Hash object,
this patch will reduce count of allocating heap areas for st_table by reuse them.
Performance of creating Hash literal -> 1.53 times faster.
Before¶
$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 51.544k i/100ms
Calculating -------------------------------------
Hash literal 869.132k (± 1.1%) i/s - 4.381M in 5.041574s
After¶
$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 63.068k i/100ms
Calculating -------------------------------------
Hash literal 1.328M (± 2.3%) i/s - 6.685M in 5.037861s
Test code¶
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "Hash literal" do |loop|
count = 0
while count < loop
hash = {foo: 12, bar: 34, baz: 56}
count += 1
end
end
end
Patch¶
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
- You modified. st.h. Effectively killed binary compatibility.
- So you pool st_table to avoid malloc. The speedup depends on which malloc implementation you use. Isn't it because mac OS's malloc is slow?
Updated by watson1978 (Shizuo Fujita) about 7 years ago
So you pool st_table to avoid malloc. The speedup depends on which malloc implementation you use.
Exactly. you're right.
Isn't it because mac OS's malloc is slow?
I guess macOS's malloc is slower than Linux's.
However, I guess my patch will effect to all platform because iIt is faster to retrieve from the pool than to invoke malloc()
Updated by watson1978 (Shizuo Fujita) about 7 years ago
I updated the patch to keep binary compatibility.
https://github.com/ruby/ruby/pull/1766/commits/70a7b48aa18cdcaa9abf5acf93e2307c24b40a33
And I took a benchmark on Linux. Seem that Linux's malloc is better than macOS's about performance.
https://github.com/ruby/ruby/pull/1766#issuecomment-348787373
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
watson1978 (Shizuo Fujita) wrote:
I updated the patch to keep binary compatibility.
https://github.com/ruby/ruby/pull/1766/commits/70a7b48aa18cdcaa9abf5acf93e2307c24b40a33
Good. However, I think you should consider using ccan/list. It is field proven.
And I took a benchmark on Linux. Seem that Linux's malloc is better than macOS's about performance.
https://github.com/ruby/ruby/pull/1766#issuecomment-348787373
Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.
Updated by watson1978 (Shizuo Fujita) about 7 years ago
Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.
I tried jemalloc at https://github.com/ruby/ruby/pull/1766#issuecomment-350553062
Just used jemalloc, I got a performance improvement 4.381M -> 7.071M (61.4% up) on macOS.
but, it was no difference for Ubuntu 17.10.
Good. However, I think you should consider using ccan/list. It is field proven.
I updated a patch to use ccan/list at https://github.com/ruby/ruby/pull/1766/commits/f0189dae115cee3fa3dbb5eadb7332f2d082be5c
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
watson1978 (Shizuo Fujita) wrote:
Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.
I tried jemalloc at https://github.com/ruby/ruby/pull/1766#issuecomment-350553062
Just used jemalloc, I got a performance improvement 4.381M -> 7.071M (61.4% up) on macOS.
but, it was no difference for Ubuntu 17.10.Good. However, I think you should consider using ccan/list. It is field proven.
I updated a patch to use ccan/list at https://github.com/ruby/ruby/pull/1766/commits/f0189dae115cee3fa3dbb5eadb7332f2d082be5c
Thank you! The patch seems perfect except one thing... ccan/list is already included in ccan/list/list.h. You don't have to copy & paste it again.
Updated by watson1978 (Shizuo Fujita) about 7 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r61309.
Improve performance of creating Hash object
When generate Hash object, the heap area of st_table will be always allocated in internally
and seems it take a time.
To improve performance of creating Hash object,
this patch will reduce count of allocating heap areas for st_table by reuse them.
Performance of creating Hash literal -> 1.53 times faster.
[Fix GH-1766] [ruby-core:84008] [Feature #14146]
Environment¶
- OS : macOS 10.13.1
- CPU : 1.4 GHz Intel Core i7
- Compiler : Apple LLVM version 9.0.0 (clang-900.0.39)
Before¶
$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 51.544k i/100ms
Calculating -------------------------------------
Hash literal 869.132k (± 1.1%) i/s - 4.381M in 5.041574s
After¶
$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 63.068k i/100ms
Calculating -------------------------------------
Hash literal 1.328M (± 2.3%) i/s - 6.685M in 5.037861s
Test code¶
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "Hash literal" do |loop|
count = 0
while count < loop
hash = {foo: 12, bar: 34, baz: 56}
count += 1
end
end
end