Feature #11405: [PATCH] hash.c: minor speedups to int/fixnum keys - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #11405

closed

[PATCH] hash.c: minor speedups to int/fixnum keys

Added by normalperson (Eric Wong) about 10 years ago. Updated over 9 years ago.

Status:

Closed

Assignee:

normalperson (Eric Wong)

Target version:

[ruby-core:70175]

Description

Noticed with [ruby-core:70159] [Bug #11396]

The low bits of Ruby object IDs are rarely populated in the current
implementation, so ensure the get used.

Early versions of this patch redundantly shifted static symbols in
any_hash, causing regressions with static symbols in hash_aref_sym

hash.c (any_hash): skip rb_objid_hash for static syms
(rb_num_hash_start): extract from rb_ident_hash
(rb_objid_hash): call rb_num_hash_start
(rb_ident_hash): ditto

target 0: a (ruby 2.3.0dev (2015-07-30 trunk 51437) [x86_64-linux]
target 1: b (ruby 2.3.0dev (2015-07-30 patch 51437) [x86_64-linux]

benchmark results from Xeon E3-1230 v3 @ 3.30GHz (turbo disabled):
minimum results in each 10 measurements.
Execution time (sec)
name                a       b
hash_aref_dsym        0.316   0.300
hash_aref_dsym_long   5.106   5.063
hash_aref_fix         0.304   0.297
hash_aref_flo         0.061   0.060
hash_aref_miss        0.433   0.430
hash_aref_str         0.408   0.396
hash_aref_sym         0.312   0.306
hash_aref_sym_long    0.482   0.469
hash_flatten          0.385   0.273
hash_ident_flo        0.036   0.037
hash_ident_num        0.277   0.276
hash_ident_obj        0.291   0.284
hash_ident_str        0.289   0.286
hash_ident_sym        0.285   0.281
hash_keys             0.269   0.271
hash_shift            0.020   0.016
hash_values           0.264   0.264
loop_whileloop2       0.101   0.099
vm2_bighash*          3.066   2.972

Speedup ratio: compare with the result of `a' (greater is better)
name                b
hash_aref_dsym        1.052
hash_aref_dsym_long   1.008
hash_aref_fix         1.024
hash_aref_flo         1.015
hash_aref_miss        1.007
hash_aref_str         1.031
hash_aref_sym         1.018
hash_aref_sym_long    1.027
hash_flatten          1.410
hash_ident_flo        0.994
hash_ident_num        1.001
hash_ident_obj        1.022
hash_ident_str        1.012
hash_ident_sym        1.016
hash_keys             0.992
hash_shift            1.237
hash_values           1.001
loop_whileloop2       1.013
vm2_bighash*          1.032

I will commit in a few weeks if everyone is OK with this.

Files

0001-hash.c-improve-integer-fixnum-hashing.patch (5.17 KB) 0001-hash.c-improve-integer-fixnum-hashing.patch

normalperson (Eric Wong), 07/30/2015 02:19 AM

Actions

Copy link

#1 [ruby-core:70178]

Updated by nobu (Nobuyoshi Nakada) about 10 years ago

Description updated (diff)

Already committed?

Actions

Copy link

#2 [ruby-core:70181]

Updated by normalperson (Eric Wong) about 10 years ago

File 0001-hash.c-improve-integer-fixnum-hashing.patch 0001-hash.c-improve-integer-fixnum-hashing.patch added

Oops, forgot to attach patch. Not committed, yet. Not urgent, I think.

Actions

Copy link

Updated by Anonymous about 10 years ago

Status changed from Open to Closed

Applied in changeset r51582.

hash.c: improve integer/fixnum hashing

The low bits of Ruby object IDs are rarely populated in the current
implementation, so ensure the get used.

Early versions of this patch redundantly shifted static symbols in
any_hash, causing regresions with static symbols in hash_aref_sym

hash.c (any_hash): skip rb_objid_hash for static syms
(rb_num_hash_start): extract from rb_ident_hash
(rb_objid_hash): call rb_num_hash_start
(rb_ident_hash): ditto
[ruby-core:70181] [Feature #11405]

target 0: a (ruby 2.3.0dev (2015-07-30 trunk 51437) [x86_64-linux]
target 1: b (ruby 2.3.0dev (2015-07-30 patch 51437) [x86_64-linux]

benchmark results from Xeon E3-1230 v3 @ 3.30GHz (turbo disabled):
minimum results in each 10 measurements.
Execution time (sec)
name a b
hash_aref_dsym 0.316 0.300
hash_aref_dsym_long 5.106 5.063
hash_aref_fix 0.304 0.297
hash_aref_flo 0.061 0.060
hash_aref_miss 0.433 0.430
hash_aref_str 0.408 0.396
hash_aref_sym 0.312 0.306
hash_aref_sym_long 0.482 0.469
hash_flatten 0.385 0.273
hash_ident_flo 0.036 0.037
hash_ident_num 0.277 0.276
hash_ident_obj 0.291 0.284
hash_ident_str 0.289 0.286
hash_ident_sym 0.285 0.281
hash_keys 0.269 0.271
hash_shift 0.020 0.016
hash_values 0.264 0.264
loop_whileloop2 0.101 0.099
vm2_bighash* 3.066 2.972

Speedup ratio: compare with the result of `a' (greater is better)
name b
hash_aref_dsym 1.052
hash_aref_dsym_long 1.008
hash_aref_fix 1.024
hash_aref_flo 1.015
hash_aref_miss 1.007
hash_aref_str 1.031
hash_aref_sym 1.018
hash_aref_sym_long 1.027
hash_flatten 1.410
hash_ident_flo 0.994
hash_ident_num 1.001
hash_ident_obj 1.022
hash_ident_str 1.012
hash_ident_sym 1.016
hash_keys 0.992
hash_shift 1.237
hash_values 1.001
loop_whileloop2 1.013
vm2_bighash* 1.032

Actions

Copy link

#4 [ruby-core:72028]

Updated by mame (Yusuke Endoh) over 9 years ago

Status changed from Closed to Assigned
Assignee set to normalperson (Eric Wong)

This caused a lot of trivial hash conflicts of Fixnums that is >= 16384.

$ ./miniruby -ve 'p 16384.hash; p 16385.hash'
ruby 2.3.0dev (2015-12-10 master 52945) [x86_64-linux]
1104801043349207800
1104801043349207800

Other pairs:

p [16386, 16387].map {|n| n.hash }.uniq #=> [2133837449075777600]
p [16388, 16389].map {|n| n.hash }.uniq #=> [3903799135928350277]
p [16390, 16391].map {|n| n.hash }.uniq #=> [-3836391686716480155]
p [16392, 16393].map {|n| n.hash }.uniq #=> [1714559302010572050]
p [16394, 16395].map {|n| n.hash }.uniq #=> [2147130354083423794]
p [16396, 16397].map {|n| n.hash }.uniq #=> [-679539024000319657]
p [16398, 16399].map {|n| n.hash }.uniq #=> [4056286416392832887]
p [16400, 16401].map {|n| n.hash }.uniq #=> [2733766810351706956]
p [16402, 16403].map {|n| n.hash }.uniq #=> [-1228876631044862612]
p [16404, 16405].map {|n| n.hash }.uniq #=> [1418226818996216529]

Unlike the name suggests, rb_objid_hash is also used to calculate a hash value of Fixnum.

IMO, we must fix this issue before 2.3.0 release.

--
Yusuke Endoh mame@ruby-lang.org

Actions

Copy link

#5 [ruby-core:72029]

Updated by Hanmac (Hans Mackowiak) over 9 years ago

i did a little check:

16384.upto(65535).group_by(&:hash).select {|k,v| v.size == 1} #=> {}

means from 16384 to 65535 all numbers have duplicated hash values

Edit:
i did scan for more and i found this: four following numbers sharing the same hash value.

[49152, 49153, 49154, 49155].map(&:hash).uniq #=> [-1642914107055884207]

Actions

Copy link

#6 [ruby-core:72040]

Updated by normalperson (Eric Wong) over 9 years ago

mame@ruby-lang.org wrote:

This caused a lot of trivial hash conflicts of Fixnums that is >= 16384.

Thanks for noticing, I will investigate.

I'm curious, how did you notice this?

I'll probably integrate some well-known hashing tests into the
test suite to prevent this in the future...

Actions

Copy link

Updated by Anonymous over 9 years ago

Status changed from Assigned to Closed

Applied in changeset r53037.

hash.c (rb_num_hash_start): avoid pathological behavior

The OR-ing itself is bad for a hash function, and shifting 3 bits
left was not enough to undo the damage done by shifting
(RUBY_SPECIAL_SHIFT+3) bits right. Experimentally, shifting 16-17
bits seemed to work well in preparing the number for murmur hash.

Add a few more benchmarks to based on bm_hash_shift to ensure
we don't hurt performance too much with tweaks.

I'm pretty confident about this change and commiting it now;
especially since we're still using Murmur behind it (but perhaps
we can update to a newer hash from Murmur...)

[ruby-core:72028] [Feature #11405]

target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby"

benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name a b
hash_aref_dsym 0.279 0.276
hash_aref_dsym_long 4.951 4.936
hash_aref_fix 0.281 0.283
hash_aref_flo 0.060 0.060
hash_aref_miss 0.409 0.410
hash_aref_str 0.387 0.385
hash_aref_sym 0.275 0.270
hash_aref_sym_long 0.410 0.411
hash_flatten 0.252 0.237
hash_ident_flo 0.035 0.032
hash_ident_num 0.254 0.251
hash_ident_obj 0.252 0.256
hash_ident_str 0.250 0.252
hash_ident_sym 0.259 0.270
hash_keys 0.267 0.267
hash_shift 0.016 0.015
hash_shift_u16 0.074 0.072
hash_shift_u24 0.071 0.071
hash_shift_u32 0.073 0.072
hash_to_proc 0.008 0.008
hash_values 0.263 0.264

Speedup ratio: compare with the result of `a' (greater is better)
name b
hash_aref_dsym 1.009
hash_aref_dsym_long 1.003
hash_aref_fix 0.993
hash_aref_flo 1.001
hash_aref_miss 0.996
hash_aref_str 1.006
hash_aref_sym 1.017
hash_aref_sym_long 0.998
hash_flatten 1.061
hash_ident_flo 1.072
hash_ident_num 1.012
hash_ident_obj 0.987
hash_ident_str 0.993
hash_ident_sym 0.959
hash_keys 0.997
hash_shift 1.036
hash_shift_u16 1.039
hash_shift_u24 1.001
hash_shift_u32 1.017
hash_to_proc 1.001
hash_values 0.995

Actions

Copy link

#8 [ruby-core:72057]

Updated by mame (Yusuke Endoh) over 9 years ago

I'm curious, how did you notice this?

In short, I was studying hash conflicts for another reason.

The detailed context is off topic, but you might be interested.
I encountered an interesting behavior of case statement:

100000000.times do
case 0
when 0
do_something
when 1, 2, 3, ..., 10000
raise
end
end

was always slower than

100000000.times do
case 0
when 1, 2, 3, ..., 10000
raise
when 0
do_something
end
end

, even though the only difference is the order of when clauses.

The cause was a dense cdhash in opt_case_dispatch. Both programs
create a cdhash that has 10001 elements, which are slightly less than
10240 (= bin size 2048 * rehashing threshold 5). So, each bin has a
very long linked list (average length is 5).

The former program inserts 0 to the cdhash at first. Because the
final cdhash has been rehashed after 0 was inserted, we cannot expect
where the element is placed in a linked list. The latter, on the
other hand, inserts 0 at last. In this case we can expect 0 is placed
at the head of the linked list. Because of no need to follow the
linked list, it is faster to find.

Though I'm not sure if this IS a problem, it was anyway fixed by nobu
at r53031 by explicitly rehashing a cdhash.

But this behavior of st_table is not only for case statement. I could
reproduce it by normal Hash:

h = {}
10000.times do |n|
h[n] = true
end
1_000_000_000.times { h[9999] } # 57.2 sec
1_000_000_000.times { h[0] } # 89.1 sec

I'm not sure if (and how) we should fix this. Reducing rehash
threshold would work, but it will cause frequent rehashing, which may
lead to another overhead and memory waste.

2015-12-11 4:23 GMT+09:00 Eric Wong normalperson@yhbt.net:

mame@ruby-lang.org wrote:

This caused a lot of trivial hash conflicts of Fixnums that is >= 16384.

Thanks for noticing, I will investigate.

I'm curious, how did you notice this?

I'll probably integrate some well-known hashing tests into the
test suite to prevent this in the future...

Actions

Copy link

#9 [ruby-core:72061]

Updated by funny_falcon (Yura Sokolov) over 9 years ago

The cause was a dense cdhash in opt_case_dispatch. Both programs
create a cdhash that has 10001 elements, which are slightly less than
10240 (= bin size 2048 * rehashing threshold 5). So, each bin has a
very long linked list (average length is 5).

Why not decrease rehashing threshold?
1 is very good choice :)
Yeah, it will increase memory consumption a bit.
ok, let it be 2

Actions

Copy link

#10 [ruby-core:72060]

Updated by normalperson (Eric Wong) over 9 years ago

Yusuke Endoh mame@ruby-lang.org wrote:

I'm curious, how did you notice this?

In short, I was studying hash conflicts for another reason.

The detailed context is off topic, but you might be interested.

Though I'm not sure if this IS a problem, it was anyway fixed by nobu
at r53031 by explicitly rehashing a cdhash.

Thanks, I was wondering about the reference in the r53031 commit
message last night.
(http://d.hatena.ne.jp/ku-ma-me/20151210)

But this behavior of st_table is not only for case statement. I could
reproduce it by normal Hash:
h = {}
10000.times do |n|
  h[n] = true
end
1_000_000_000.times { h[9999] } # 57.2 sec
1_000_000_000.times { h[0] }    # 89.1 sec
I'm not sure if (and how) we should fix this. Reducing rehash
threshold would work, but it will cause frequent rehashing, which may
lead to another overhead and memory waste.

Perhaps teach developers to rehash explicitly. Maybe Hash#freeze
can imply rehash, too. But yes, I worry about memory increase more
than speed nowadays.

Actions

Copy link

#11 [ruby-core:72069]

Updated by mame (Yusuke Endoh) over 9 years ago

Eric Wong wrote:

Perhaps teach developers to rehash explicitly. Maybe Hash#freeze
can imply rehash, too. But yes, I worry about memory increase more
than speed nowadays.

I'm not sure if explicit/implicit rehasing will solve any issue.

The problem is that the linked lists are too(?) long. Long linked lists decrease the average access speed, and also makes per-element access speed non-uniform.
Rehasing will not improve the average speed, nor eliminate/relax the inequality. It will just make us impossible to predict which element is fast to access and which is slow. Will this make us happy?

Yura Sokolov wrote:

Why not decrease rehashing threshold?
1 is very good choice :)
Yeah, it will increase memory consumption a bit.
ok, let it be 2

Yes, it will solve the issue. But I'm unsure if we need to fix this issue. Accessing elements takes just some nano seconds. Is it a bottleneck in a real-life use-case?
And, to find an appropreate threshold, we need to perform exhaustive benchmark with some real-life applications. Hard work!

--
Yusuke Endoh mame@ruby-lang.org

Actions

Copy link

#12 [ruby-core:72073]

Updated by normalperson (Eric Wong) over 9 years ago

mame@ruby-lang.org wrote:

Eric Wong wrote:

Perhaps teach developers to rehash explicitly. Maybe Hash#freeze
can imply rehash, too. But yes, I worry about memory increase more
than speed nowadays.

I'm not sure if explicit/implicit rehasing will solve any issue.

The problem is that the linked lists are too(?) long. Long linked
lists decrease the average access speed, and also makes per-element
access speed non-uniform. Rehasing will not improve the average
speed, nor eliminate/relax the inequality.

But wasn't the goal of adding rehash in r53031 to improve speed and
relax the inequality?

It will just make us
impossible to predict which element is fast to access and which is
slow. Will this make us happy?

I mean we may take freeze as a hint from the user to optimize the hash.

Perhaps we may rearrange data in contiguous memory for improved locality
(like "git repack" or defragmenting a filesystem). I doubt we can have
a compacting GC at this point, but small-scale, explicit compaction
might still work.

Actions

Copy link

#13 [ruby-core:72076]

Updated by mame (Yusuke Endoh) over 9 years ago

2015-12-12 12:21 GMT+09:00 Eric Wong normalperson@yhbt.net:

But wasn't the goal of adding rehash in r53031 to improve speed and relax the inequality?

I don't think so. It just tries to "hide" the inequality from users.
It won't improve average speed neither relax the inequality itself, if
my understanding is correct.

Perhaps we may rearrange data in contiguous memory for improved locality

Interesting. It will not fix the "too long linked list" problem, but
reduce cache miss. Or, we may use another hash mechanism suitable for
static contents, like perfect hash.

2015-12-12 12:21 GMT+09:00 Eric Wong normalperson@yhbt.net:

mame@ruby-lang.org wrote:

Eric Wong wrote:

Perhaps teach developers to rehash explicitly. Maybe Hash#freeze
can imply rehash, too. But yes, I worry about memory increase more
than speed nowadays.

I'm not sure if explicit/implicit rehasing will solve any issue.

The problem is that the linked lists are too(?) long. Long linked
lists decrease the average access speed, and also makes per-element
access speed non-uniform. Rehasing will not improve the average
speed, nor eliminate/relax the inequality.

But wasn't the goal of adding rehash in r53031 to improve speed and
relax the inequality?

It will just make us
impossible to predict which element is fast to access and which is
slow. Will this make us happy?

I mean we may take freeze as a hint from the user to optimize the hash.

Perhaps we may rearrange data in contiguous memory for improved locality
(like "git repack" or defragmenting a filesystem). I doubt we can have
a compacting GC at this point, but small-scale, explicit compaction
might still work.

Actions

Copy link

#14 [ruby-core:72078]

Updated by funny_falcon (Yura Sokolov) over 9 years ago

Yes, it will solve the issue. But I'm unsure if we need to fix this issue. Accessing elements takes just some nano seconds. Is it a bottleneck in a real-life use-case?

It gave ~4% performance improvement for 1.9.3 in realworld applications . Now, with separate id_table, it could be less important. But still it worths to check.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #11405

[PATCH] hash.c: minor speedups to int/fixnum keys

Updated by nobu (Nobuyoshi Nakada) about 10 years ago

Updated by normalperson (Eric Wong) about 10 years ago

Updated by Anonymous about 10 years ago

Updated by mame (Yusuke Endoh) over 9 years ago

Updated by Hanmac (Hans Mackowiak) over 9 years ago

Updated by normalperson (Eric Wong) over 9 years ago

Updated by Anonymous over 9 years ago

Updated by mame (Yusuke Endoh) over 9 years ago

Updated by funny_falcon (Yura Sokolov) over 9 years ago

Updated by normalperson (Eric Wong) over 9 years ago

Updated by mame (Yusuke Endoh) over 9 years ago

Updated by normalperson (Eric Wong) over 9 years ago

Updated by mame (Yusuke Endoh) over 9 years ago

Updated by funny_falcon (Yura Sokolov) over 9 years ago