Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-01-20T04:02:53Z</p> <ul></ul><p>Updated patch and pull due to r44634 (hash_pos)<br> <a href="http://bogomips.org/ruby.git/patch?id=7c7e41496d6b28a4b" class="external">http://bogomips.org/ruby.git/patch?id=7c7e41496d6b28a4b</a></p> <p>The following changes since commit 5ecbe189af77c309845d662f26c0b2797bde2915:</p> <p>socket/option.c: helper functions (2014-01-19 14:56:05 +0000)</p> <p>are available in the git repository at:</p> <p>git://80x24.org/ruby.git st-noprime-r44658</p> <p>for you to fetch changes up to 7c7e41496d6b28a4b1a516c0df9754c03447a95d:</p> <p>st: use power-of-two sizes to avoid slow modulo ops (2014-01-19 23:45:52 +0000)</p> <hr> <p>Eric Wong (1):<br> st: use power-of-two sizes to avoid slow modulo ops</p> <p>st.c | 61 +++++--------------------------------------------------------<br> 1 file changed, 5 insertions(+), 56 deletions(-)</p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-01-20T17:48:44Z</p> <ul></ul><p>Impressive.</p> <p>Plus I think you have more rooms for optimizations by sticking to<br> power-of-two sized bins. When you rehash a hash, right now all<br> elements are cleaned up from the bin, resize it, then insert them<br> again one-by-one. If number of bins are always 2^n, rehashing is<br> to resize a 2^n array to 2^(n+1). This case the elements stored in<br> new_bins[2^n+k] can only come from new_bins[k]. This fact does not<br> change the algorithmic complexity, but can reduce insertions.</p> <p><a href="https://github.com/shyouhei/ruby/commit/f7ca891" class="external">https://github.com/shyouhei/ruby/commit/f7ca891</a></p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-01-21T02:41:41Z</p> <ul></ul><p><a href="mailto:shyouhei@ruby-lang.org" class="email">shyouhei@ruby-lang.org</a> wrote:</p> <blockquote> <p>Plus I think you have more rooms for optimizations by sticking to<br> power-of-two sized bins. When you rehash a hash, right now all<br> elements are cleaned up from the bin, resize it, then insert them<br> again one-by-one. If number of bins are always 2^n, rehashing is<br> to resize a 2^n array to 2^(n+1). This case the elements stored in<br> new_bins[2^n+k] can only come from new_bins[k]. This fact does not<br> change the algorithmic complexity, but can reduce insertions.</p> <p><a href="https://github.com/shyouhei/ruby/commit/f7ca891" class="external">https://github.com/shyouhei/ruby/commit/f7ca891</a></p> </blockquote> <p>Thanks! However, I wasn't able to show a difference with<br> "make benchmark"[1]. Were you?</p> <p>Perhaps rehash is not called often enough, and I think a branch<br> inside the loop is difficult for the CPU to optimize. I think<br> the current dumb loop is very good for CPU pipelining and prefetch.</p> <p>[1] I have applied my patches for improved benchmark consistency:<br> <a href="https://bugs.ruby-lang.org/issues/5985#change-44442" class="external">https://bugs.ruby-lang.org/issues/5985#change-44442</a><br> <a href="https://bugs.ruby-lang.org/issues/9430" class="external">https://bugs.ruby-lang.org/issues/9430</a></p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-01-22T03:21:59Z</p> <ul></ul><p>On 01/21/2014 11:38 AM, Eric Wong wrote:</p> <blockquote> <blockquote> <p><a href="https://github.com/shyouhei/ruby/commit/f7ca891" class="external">https://github.com/shyouhei/ruby/commit/f7ca891</a></p> </blockquote> <p>Thanks! However, I wasn't able to show a difference with<br> "make benchmark"[1]. Were you?</p> <p>Perhaps rehash is not called often enough, and I think a branch<br> inside the loop is difficult for the CPU to optimize. I think<br> the current dumb loop is very good for CPU pipelining and prefetch.</p> </blockquote> <p>No, sorry I see no evident speedup. When I wrote the patch I thought the<br> function was used for Hash#rehash, but it turned out Hash#rehash uses<br> something different (don't know why). The optimization is valid I<br> believe but in fact used very rarely.</p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-01-22T08:12:52Z</p> <ul></ul><p>Urabe Shyouhei <a href="mailto:shyouhei@ruby-lang.org" class="email">shyouhei@ruby-lang.org</a> wrote:</p> <blockquote> <p>No, sorry I see no evident speedup. When I wrote the patch I thought the<br> function was used for Hash#rehash, but it turned out Hash#rehash uses<br> something different (don't know why). The optimization is valid I<br> believe but in fact used very rarely.</p> </blockquote> <p>Alright. My understanding is branch mispredict costs are higher than<br> the memory stores which would be avoided. The expensive part is loading<br> memory on cache miss, and that is not avoided.</p> <p>We'll probably need to poke around with perf or similar tools to<br> analyze/confirm this.</p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-02-01T08:02:50Z</p> <ul></ul><p>Notes to self (or anybody else with free time right now):<br> test and verify compare_by_identity performance</p> <p>I'm comfortable that ID, string and most objects will hash well with<br> power-of-two; but compare_by_identity, weakmap, vm->living_threads may<br> hash less well without a prime modulo (or maybe they hash badly<br> regardless of modulo!)</p> <p>I'm also interested in using a doubly-linked list (ccan/list[1])<br> for vm->living_threads (and possibly other places).</p> <p>[1] BSDL version of the Linux kernel list.h at <a href="http://ccodearchive.net/" class="external">http://ccodearchive.net/</a></p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-03T06:33:20Z</p> <ul></ul><p><a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p> <blockquote> <p>test and verify compare_by_identity performance</p> <p>I'm comfortable that ID, string and most objects will hash well with<br> power-of-two; but compare_by_identity, weakmap, vm->living_threads may<br> hash less well without a prime modulo (or maybe they hash badly<br> regardless of modulo!)</p> </blockquote> <p>OK, I was right about compare_by_identity being worse with power-of-two,<br> but I fixed it by tweaking numhash:</p> <p><a href="http://bogomips.org/ruby.git/patch?id=1579e9d0d82789" class="external">http://bogomips.org/ruby.git/patch?id=1579e9d0d82789</a></p> <p>I was wrong about IDs hashing well before, they hash OK now :)</p> <p>results: <a href="http://80x24.org/bmlog-20140303-034047.26775.txt" class="external">http://80x24.org/bmlog-20140303-034047.26775.txt</a><br> the hash parts:<br> hash_aref_miss 1.048<br> hash_aref_str 1.162<br> hash_aref_sym 1.000<br> hash_flatten 1.092<br> hash_ident_num 1.007<br> hash_ident_obj 1.098<br> hash_ident_str 1.106<br> hash_ident_sym 1.018<br> hash_keys 1.000<br> hash_shift 1.011<br> hash_values 1.011<br> vm2_bighash* 1.183</p> <p>These numbers are from my weaker AMD FX-8320 which gave me worse numbers<br> than my Haswell machine in my original test. I'll try to test on my<br> Haswell machine soon (network outage there :<).</p> <p>I'd like to commit the following three patches soon:<br> <a href="http://bogomips.org/ruby.git/patch?id=a3fde671ffeec8" class="external">http://bogomips.org/ruby.git/patch?id=a3fde671ffeec8</a> new hash benchmarks<br> <a href="http://bogomips.org/ruby.git/patch?id=8f155afef61342" class="external">http://bogomips.org/ruby.git/patch?id=8f155afef61342</a> original patch<br> <a href="http://bogomips.org/ruby.git/patch?id=1579e9d0d82789" class="external">http://bogomips.org/ruby.git/patch?id=1579e9d0d82789</a> numhash tweak</p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-18T09:12:01Z</p> <ul></ul><p><a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p> <blockquote> <pre><code> hash_aref_sym 1.000 </code></pre> </blockquote> <p>Lack of improvement here was disappointing since symbol keys are common,<br> and this showed a regression on my x86 (32-bit) VMs.</p> <p>I tweaked rb_any_hash to be symbol-aware:</p> <pre><code>http://bogomips.org/ruby.git/patch?id=497ed6355 </code></pre> <p>12-30% improvement on this test from trunk depending on CPU so far \o/<br> (Phenom II X4 got 30%, newer/faster x86-64 CPUs show less speedup).</p> <p>I'm comfortable with improvements of this series on x86 VMs running on<br> x86-64 (and of course native x86-64).</p> <p>Can anybody with real 32-bit hardware verify this series? Not sure I<br> can trust VM results; my remaining x86 hardware is on its last legs<br> and showing occasional HW errors.</p> <p>git://80x24.org/ruby.git st-noprime-v4</p> <pre><code> st: use power-of-two sizes to avoid slow modulo ops add hash benchmarks st.c: tweak numhash to match common Ruby use cases hash.c: improve symbol hash distribution </code></pre> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-18T09:48:56Z</p> <ul></ul><p>Can you test this</p> <ul> <li>if (SYMBOL_P(a)) a = (a >> RUBY_SPECIAL_SHIFT) ^ (a >><br> (RUBY_SPECIAL_SHIFT + 3)); /* 3=ID_SCOPE_SHIFT */</li> </ul> <p>2014-03-18 13:02 GMT+04:00 Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a>:</p> <blockquote> <p><a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p> <blockquote> <pre><code> hash_aref_sym 1.000 </code></pre> </blockquote> <p>Lack of improvement here was disappointing since symbol keys are common,<br> and this showed a regression on my x86 (32-bit) VMs.</p> <p>I tweaked rb_any_hash to be symbol-aware:</p> <pre><code> http://bogomips.org/ruby.git/patch?id=497ed6355 </code></pre> <p>12-30% improvement on this test from trunk depending on CPU so far \o/<br> (Phenom II X4 got 30%, newer/faster x86-64 CPUs show less speedup).</p> <p>I'm comfortable with improvements of this series on x86 VMs running on<br> x86-64 (and of course native x86-64).</p> <p>Can anybody with real 32-bit hardware verify this series? Not sure I<br> can trust VM results; my remaining x86 hardware is on its last legs<br> and showing occasional HW errors.</p> <p>git://80x24.org/ruby.git st-noprime-v4</p> <pre><code> st: use power-of-two sizes to avoid slow modulo ops add hash benchmarks st.c: tweak numhash to match common Ruby use cases hash.c: improve symbol hash distribution </code></pre> </blockquote> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-18T19:48:47Z</p> <ul></ul><p><a href="mailto:funny.falcon@gmail.com" class="email">funny.falcon@gmail.com</a> wrote:</p> <blockquote> <p>Can you test this</p> <ul> <li>if (SYMBOL_P(a)) a = (a >> RUBY_SPECIAL_SHIFT) ^ (a >><br> (RUBY_SPECIAL_SHIFT + 3)); /* 3=ID_SCOPE_SHIFT */</li> </ul> </blockquote> <p>Seems to hurt performance on x86 compared to my original. I don't think<br> ID scope bits are useful for this case. From what I've seen, Ruby users<br> tend to limit symbol names to \w when using the Hash class and keywords.</p> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-22T23:34:32Z</p> <ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset r45384.</p> <hr> <p>st.c: use power-of-two sizes to avoid slow modulo ops</p> <ul> <li>st.c (hash_pos): use bitwise AND to avoid slow modulo op<br> (new_size): power-of-two sizes for hash_pos change<br> (st_numhash): adjust for common keys due to lack of prime modulo<br> [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: [PATCH] st: use power-of-two sizes to avoid slow modulo ops (Closed)" href="https://bugs.ruby-lang.org/issues/9425">#9425</a>]</li> <li>hash.c (rb_any_hash): right shift for symbols</li> <li>benchmark/bm_hash_aref_miss.rb: added to show improvement</li> <li>benchmark/bm_hash_aref_sym_long.rb: ditto</li> <li>benchmark/bm_hash_aref_str.rb: ditto</li> <li>benchmark/bm_hash_aref_sym.rb: ditto</li> <li>benchmark/bm_hash_ident_num.rb: added to prevent regression</li> <li>benchmark/bm_hash_ident_obj.rb: ditto</li> <li>benchmark/bm_hash_ident_str.rb: ditto</li> <li>benchmark/bm_hash_ident_sym.rb: ditto</li> </ul> </article> <article> <h1>Ruby master - Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops</h1> <p>2014-03-22T23:48:53Z</p> <ul></ul><p><a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p> <blockquote> <p>OK, I was right about compare_by_identity being worse with power-of-two,<br> but I fixed it by tweaking numhash:</p> <p><a href="http://bogomips.org/ruby.git/patch?id=1579e9d0d82789" class="external">http://bogomips.org/ruby.git/patch?id=1579e9d0d82789</a></p> </blockquote> <p>I tweaked that further, adding +3 instead of +1 to RUBY_SPECIAL_SHIFT<br> in r45384. Also updated NEWS in case some extensions need tweaking for<br> performance.</p> <p>Haswell Xeon E3-1230 v3 numbers:</p> <p>hash_aref_miss 1.166<br> hash_aref_str 1.167<br> hash_aref_sym 1.224<br> hash_aref_sym_long 1.270<br> hash_flatten 1.656<br> hash_ident_num 1.142<br> hash_ident_obj 1.193<br> hash_ident_str 1.194<br> hash_ident_sym 1.171<br> hash_keys 1.002<br> hash_shift 1.122<br> hash_values 1.006<br> loop_whileloop2 1.001<br> vm2_bighash* 1.233</p> </article> </main></body></html>