Feature #15631
closedLet round_capa for ID table not allocate excess capacity for power of 2 ints >= 4
Description
right now round_capa value is rounded up to the next power of 2
round_capa(4) -> returns 8
round_capa(8) -> returns 16
round_capa(16) -> returns 32
round_capa(5) -> returns 8
round_capa(9) -> returns 16
round_capa(17) -> returns 32
etc.
it seems wasteful to allocate the extra items capacity, so this PR changes that to
round_capa(4) -> returns 4
round_capa(8) -> returns 8
round_capa(16) -> returns 16
round_capa(5) -> returns 8
round_capa(9) -> returns 16
round_capa(17) -> returns 32
etc.
the main purpose is to reduce memory usage especially during boot
my patch also uses BUILTIN_CLZ macro instead of shifts that makes it slightly faster
here's a benchmark
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(time: 20, warmup: 3)
x.report('struct', "Struct.new(*('a'..'z').map { |x| x.to_sym })")
end
trunk
Warming up --------------------------------------
struct 527.000 i/100ms
Calculating -------------------------------------
struct 5.461k (± 5.5%) i/s - 109.089k in 20.040253s
methodmising - POW2_P (github)
Warming up --------------------------------------
struct 544.000 i/100ms
Calculating -------------------------------------
struct 5.570k (± 4.1%) i/s - 111.520k in 20.057245s
ahorek - BUILTIN_CLZ (id_table.c.patch)
Warming up --------------------------------------
struct 571.000 i/100ms
Calculating -------------------------------------
struct 5.812k (± 3.6%) i/s - 116.484k in 20.070607s
discussion https://github.com/ruby/ruby/pull/2083
Files
Updated by methodmissing (Lourens Naudé) almost 6 years ago
Thanks for raising this Pavel.
st_init_table_with_size(0)
effectively also allocates additional capacity, but if and how quickly the hash tables mutate I'll investigate later.
References https://github.com/ruby/ruby/blob/trunk/st.c#L573-L578 , https://github.com/ruby/ruby/blob/trunk/st.c#L595 and https://github.com/ruby/ruby/blob/trunk/st.c#L332-L359
A simple peek suggests a total table size of 152 bytes on init, but will investigate time to mutation of these 0 sized tables this evening:
diff --git a/st.c b/st.c
index ed235c674e..f2b99d7771 100644
--- a/st.c
+++ b/st.c
@@ -615,6 +615,8 @@ st_init_table_with_size(const struct st_hash_type *type, st_index_t size)
#ifdef ST_DEBUG
st_check(tab);
#endif
+ printf("# st_init_table_with_size(%d) -> %d (%d)\n", size, n, st_memsize(tab));
+
return tab;
}
linking miniruby
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(102) -> 7 (3384)
# st_init_table_with_size(255) -> 8 (7224)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(1000) -> 10 (28728)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(1000) -> 10 (28728)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(7) -> 3 (248)
# st_init_table_with_size(15) -> 4 (440)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(16) -> 5 (888)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
Updated by ko1 (Koichi Sasada) over 5 years ago
- Assignee set to ko1 (Koichi Sasada)
funny_falcon, do you have any opinion?
if no opinion, we'll discuss it one month later and will commit it.
in fact, I can't check algorithm, so we can try it.
ahorek:
could you give me more performance measurements?
- you should not call
map
in iteration (you should prepare IDs before) - now you only measures 26 fields. could you measure other numbers, 1 to 50, for example.
Thanks,
Koichi
Updated by ahorek (Pavel Rosický) over 5 years ago
Thanks for the review Koichi. I tested the patch on a rails app (redmine), but unfortunatelly there's no improvement.
0.1MB less memory after boot (^150MB total)
No mesurable difference in performance
I'll investigate the second case. Hash creation might be a better place to optimize.
Updated by ahorek (Pavel Rosický) over 5 years ago
- File st.c.patch st.c.patch added
I've attached the second patch for st_init_table_with_size. In theory it should be faster, but I can't measure any difference in ruby.
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
It should use SIZEOF_ST_INDEX_T
and nlz_intptr
.
Updated by methodmissing (Lourens Naudé) over 5 years ago
Pavel added a new patch for get_power2
in https://github.com/ruby/ruby/pull/2292
Updated by ahorek (Pavel Rosický) over 5 years ago
- File st.c.patch st.c.patch added
Updated by ko1 (Koichi Sasada) over 5 years ago
pls give us your measurements. We can't understand how it is useful.
Thanks,
Koichi
Updated by ahorek (Pavel Rosický) over 5 years ago
I run several benchmark suites for both patches
https://github.com/ruby-bench/ruby-bench-suite
https://github.com/schneems/derailed_benchmarks
but all differences were within margin of error. Here's an optimized assembly comparsion that explains why:
https://user-images.githubusercontent.com/9540855/62237913-2e221380-b3d2-11e9-932a-14b09038bf91.png
saving 1-2 instructions makes no real difference.
feel free to close the issue. Thanks
Updated by Anonymous over 5 years ago
- Status changed from Open to Closed
Applied in changeset git|8e13da1ee83028000e5d7f9f9526379e32765a81.
optimize get_power2 [Feature #15631]