Project

General

Profile

Feature #14146

Improve performance of creating Hash object

Added by watson1978 (Shizuo Fujita) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:84008]

Description

When generate Hash object, the heap area of st_table will be always allocated in internally
and seems it take a time.

To improve performance of creating Hash object,
this patch will reduce count of allocating heap areas for st_table by reuse them.

Performance of creating Hash literal -> 1.53 times faster.

Before

$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
        Hash literal    51.544k i/100ms
Calculating -------------------------------------
        Hash literal    869.132k (± 1.1%) i/s -      4.381M in   5.041574s

After

$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
        Hash literal    63.068k i/100ms
Calculating -------------------------------------
        Hash literal      1.328M (± 2.3%) i/s -      6.685M in   5.037861s

Test code

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "Hash literal" do |loop|
    count = 0
    while count < loop
      hash = {foo: 12, bar: 34, baz: 56}

      count += 1
    end
  end
end

Patch

https://github.com/ruby/ruby/pull/1766

Updated by shyouhei (Shyouhei Urabe) over 2 years ago

  • You modified. st.h. Effectively killed binary compatibility.
  • So you pool st_table to avoid malloc. The speedup depends on which malloc implementation you use. Isn't it because mac OS's malloc is slow?

Updated by watson1978 (Shizuo Fujita) over 2 years ago

So you pool st_table to avoid malloc. The speedup depends on which malloc implementation you use.

Exactly. you're right.

Isn't it because mac OS's malloc is slow?

I guess macOS's malloc is slower than Linux's.
However, I guess my patch will effect to all platform because iIt is faster to retrieve from the pool than to invoke malloc()

Updated by watson1978 (Shizuo Fujita) over 2 years ago

I updated the patch to keep binary compatibility.
https://github.com/ruby/ruby/pull/1766/commits/70a7b48aa18cdcaa9abf5acf93e2307c24b40a33

And I took a benchmark on Linux. Seem that Linux's malloc is better than macOS's about performance.
https://github.com/ruby/ruby/pull/1766#issuecomment-348787373

Updated by shyouhei (Shyouhei Urabe) over 2 years ago

watson1978 (Shizuo Fujita) wrote:

I updated the patch to keep binary compatibility.
https://github.com/ruby/ruby/pull/1766/commits/70a7b48aa18cdcaa9abf5acf93e2307c24b40a33

Good. However, I think you should consider using ccan/list. It is field proven.

And I took a benchmark on Linux. Seem that Linux's malloc is better than macOS's about performance.
https://github.com/ruby/ruby/pull/1766#issuecomment-348787373

Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.

Updated by watson1978 (Shizuo Fujita) over 2 years ago

Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.

I tried jemalloc at https://github.com/ruby/ruby/pull/1766#issuecomment-350553062
Just used jemalloc, I got a performance improvement 4.381M -> 7.071M (61.4% up) on macOS.
but, it was no difference for Ubuntu 17.10.

Good. However, I think you should consider using ccan/list. It is field proven.

I updated a patch to use ccan/list at https://github.com/ruby/ruby/pull/1766/commits/f0189dae115cee3fa3dbb5eadb7332f2d082be5c

#6

Updated by shyouhei (Shyouhei Urabe) over 2 years ago

watson1978 (Shizuo Fujita) wrote:

Any chances you tried --with-jemalloc ? It might perhaps exhibit something different.

I tried jemalloc at https://github.com/ruby/ruby/pull/1766#issuecomment-350553062
Just used jemalloc, I got a performance improvement 4.381M -> 7.071M (61.4% up) on macOS.
but, it was no difference for Ubuntu 17.10.

Good. However, I think you should consider using ccan/list. It is field proven.

I updated a patch to use ccan/list at https://github.com/ruby/ruby/pull/1766/commits/f0189dae115cee3fa3dbb5eadb7332f2d082be5c

Thank you! The patch seems perfect except one thing... ccan/list is already included in ccan/list/list.h. You don't have to copy & paste it again.

#7

Updated by watson1978 (Shizuo Fujita) over 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r61309.


Improve performance of creating Hash object

When generate Hash object, the heap area of st_table will be always allocated in internally
and seems it take a time.

To improve performance of creating Hash object,
this patch will reduce count of allocating heap areas for st_table by reuse them.

Performance of creating Hash literal -> 1.53 times faster.

[Fix GH-1766] [ruby-core:84008] [Feature #14146]

Environment

  • OS : macOS 10.13.1
  • CPU : 1.4 GHz Intel Core i7
  • Compiler : Apple LLVM version 9.0.0 (clang-900.0.39)

Before

$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 51.544k i/100ms
Calculating -------------------------------------
Hash literal 869.132k (± 1.1%) i/s - 4.381M in 5.041574s

After

$ ./miniruby -v -I. -I../benchmark-ips/lib ~/tmp/bench/literal.rb
ruby 2.5.0dev (2017-11-28 hash 60926) [x86_64-darwin17]
Warming up --------------------------------------
Hash literal 63.068k i/100ms
Calculating -------------------------------------
Hash literal 1.328M (± 2.3%) i/s - 6.685M in 5.037861s

Test code

require 'benchmark/ips'

Benchmark.ips do |x|
x.report "Hash literal" do |loop|
count = 0
while count < loop
hash = {foo: 12, bar: 34, baz: 56}

  count += 1
end

end
end

Also available in: Atom PDF