Optimize st_table (take 2)
|Target version:||next minor|
Given some of preparations to this patches already merged into ruby-trunk,
I suggest patches for improving st_table second time (first were #5789):
1) Usage of packing for st_table of any kind, not only for numeric hashes.
Most of hashes, allocated during page render in Rails are smaller than 6 entries.
In fact, during rendering "Issues" page of Redmine, 40% of hashes not even grows
above 1 entry. They are small options hashes, passed to numerous helper methods.
This patch packs hashes upto 6 entries in a way like numeric hashes from trunk.
Also it pack hashes of size 0 and 1 into
st_table inself, so that there is no
need to allocate any "bins" at all.
2) Usage of specialized pool for allocating sttable, sttableentry structures
and sttable.bins of smallest size (11)
Usage of specialized pool for this allocations give great speedup for hash creation.
Also it gives countable reduction of memory consumption.
First patch gives little overhead for creating hashes bigger than 6 entries when applied alone.
But both patches combined are not slower than ruby-trunk for hashes of any size.
Performance testing is here https://gist.github.com/1626602
#4 Updated by Yukihiro Matsumoto about 2 years ago
I meant 83.patch. I think 84.patch (pool allocation) requires more
In message "Re: [ruby-trunk - Feature #5903] Optimize sttable (take 2)"
on Tue, 31 Jan 2012 07:07:15 +0900, Nobuyoshi Nakada email@example.com writes:
|Issue #5903 has been updated by Nobuyoshi Nakada.
#5 Updated by Yukihiro Matsumoto about 2 years ago
|I meant 83.patch. I think 84.patch (pool allocation) requires more
Oops, 83.patch was pool allocation. I meant 84.patch.
#6 Updated by Yura Sokolov about 2 years ago
Nobuyoshi Nakada wrote:
On my computer, pool allocator work by 1% faster when I keep those two assignment to this "magic" field.
May be it is a case of my computer, cause when I remove
heaps_freed from gc.c (
objspace->heap.freed http://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/gc.c#L412 ), it start to work slower too.
But it is not ever assigned (cause it is initialized with zero, and
last < heaps_freed is never true http://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/gc.c#L2170 )
#9 Updated by Yura Sokolov about 2 years ago
Nobuyoshi Nakada wrote:
Another question about packing.
Why are PKEYPOS and PVALPOS from the tail?
It allows hash values to be very close to each other, so that while loop in
find_packed_index runs through them very fast and does not touch another cache line of cpu.
And only when it found equal hash it jumps to check key. This allows searching in packed hash be even slightly faster than in not packed hash of same size.
Initially I experiment with variable sized packed hashes, so that
num_bins is used and they goes from tail to avoid division by 3.
With fixed size this could be simplified.
I pushed a commit which places PKEYPOS and PVALPOS after hashes, but in forward order.
They could be placed altogether (like
remove_packed_entry should be changed accordantly. I think, this could improve iteration over hash.
#10 Updated by Yura Sokolov about 2 years ago
Table packing slows down
st_foreach a bit, and GC suffers from it.
So that I add a
st_foreach_nocheck to fix GC https://github.com/funny-falcon/ruby/commit/84d08af5d2943c3dd1a1d0c361fa22c2c7ae5ca4
#11 Updated by Yura Sokolov about 2 years ago
Also there is additional commit which increases usage of
#12 Updated by Yura Sokolov about 2 years ago
As far this ticket not closed, I'll post hash related patch here:
- remove some unused code from st.c and hash.c
- change rbhashmodify to rbhashmodifycheck when sttable allocation is not necessary
- move part of safe iteration logic to st.c to make it clearer
This is arguable change, cause it clearly do not have positive impact on performance,
make checkconsumes 592.2 second before this change and 595.4 after - less than 1 percent, so that I suppose, difference is negligible.
- introduce st_shift to optimize Hash#shift
#13 Updated by Yura Sokolov about 2 years ago
Add couple of commits to pull request https://github.com/ruby/ruby/pull/107 :
- Removal of STCHECK . STCHECK were always returned instead of STCONTINUE inside of some stforeach loops. Now such calls to stforeach are converted to calls to stforeachcheck. So that, there is no reason to differentiate STCHECK and STCONTINUE, which simplifies calling code a bit. Also, it allows to simplify stforeach_check code.
- Traditionally, ultrapacking for empty and one element tables