Feature #5903: Optimize st_table (take 2) - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #5903

closed

Optimize st_table (take 2)

Added by funny_falcon (Yura Sokolov) over 13 years ago. Updated almost 9 years ago.

Status:

Closed

Assignee:

nobu (Nobuyoshi Nakada)

Target version:

[ruby-core:42164]

Description

Given some of preparations to this patches already merged into ruby-trunk,
I suggest patches for improving st_table second time (first were #5789):

Usage of packing for st_table of any kind, not only for numeric hashes.

Most of hashes, allocated during page render in Rails are smaller than 6 entries.
In fact, during rendering "Issues" page of Redmine, 40% of hashes not even grows
above 1 entry. They are small options hashes, passed to numerous helper methods.

This patch packs hashes upto 6 entries in a way like numeric hashes from trunk.
Also it pack hashes of size 0 and 1 into st_table inself, so that there is no
need to allocate any "bins" at all.

https://github.com/ruby/ruby/pull/84.patch
https://github.com/ruby/ruby/pull/84

Usage of specialized pool for allocating st_table, st_table_entry structures
and st_table.bins of smallest size (11)

Usage of specialized pool for this allocations give great speedup for hash creation.
Also it gives countable reduction of memory consumption.

https://github.com/ruby/ruby/pull/83.patch
https://github.com/ruby/ruby/pull/83

First patch gives little overhead for creating hashes bigger than 6 entries when applied alone.
But both patches combined are not slower than ruby-trunk for hashes of any size.

Performance testing is here https://gist.github.com/1626602

Related issues 1 (0 open — 1 closed)

Actions

Copy link

#1 [ruby-core:42192]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

I've updated pool_allocation patch to use more efficient pool algorithm.
No separate commit, I just rewrite branch.

Actions

Copy link

#2 [ruby-core:42272]

Updated by matz (Yukihiro Matsumoto) over 13 years ago

I am positive about this patch. Nobu, could you review the patch and check it in unless you find any problem?

Matz.

Actions

Copy link

#3 [ruby-core:42277]

Updated by nobu (Nobuyoshi Nakada) over 13 years ago

What's "_black_magick"?

Actions

Copy link

#4 [ruby-core:42279]

Updated by matz (Yukihiro Matsumoto) over 13 years ago

Hi,

I meant 83.patch. I think 84.patch (pool allocation) requires more
discussion.

						matz.

In message "Re: [ruby-core:42277] [ruby-trunk - Feature #5903] Optimize st_table (take 2)"
on Tue, 31 Jan 2012 07:07:15 +0900, Nobuyoshi Nakada nobu@ruby-lang.org writes:
|
|
|Issue #5903 has been updated by Nobuyoshi Nakada.
|
|
|What's "_black_magick"?

Actions

Copy link

#5 [ruby-core:42280]

Updated by matz (Yukihiro Matsumoto) over 13 years ago

Hi,

In message "Re: [ruby-core:42279] Re: [ruby-trunk - Feature #5903] Optimize st_table (take 2)"
on Tue, 31 Jan 2012 08:59:06 +0900, Yukihiro Matsumoto matz@ruby-lang.org writes:

|I meant 83.patch. I think 84.patch (pool allocation) requires more
|discussion.

Oops, 83.patch was pool allocation. I meant 84.patch.

						matz.

Actions

Copy link

#6 [ruby-core:42282]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

Nobuyoshi Nakada wrote:

What's "_black_magick"?

On my computer, pool allocator work by 1% faster when I keep those two assignment to this "magic" field.

May be it is a case of my computer, cause when I remove heaps_freed from gc.c (objspace->heap.freed http://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/gc.c#L412 ), it start to work slower too.
But it is not ever assigned (cause it is initialized with zero, and last < heaps_freed is never true http://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/gc.c#L2170 )

Yura

Actions

Copy link

#7 [ruby-core:42284]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

Hi,

I could work on questions about pool allocation.

Regards,

Yura.

Actions

Copy link

#8 [ruby-core:42300]

Updated by nobu (Nobuyoshi Nakada) over 13 years ago

Another question about packing.
Why are PKEY_POS and PVAL_POS from the tail?

Actions

Copy link

#9 [ruby-core:42304]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

Nobuyoshi Nakada wrote:

Another question about packing.
Why are PKEY_POS and PVAL_POS from the tail?

It allows hash values to be very close to each other, so that while loop in find_packed_index runs through them very fast and does not touch another cache line of cpu.
And only when it found equal hash it jumps to check key. This allows searching in packed hash be even slightly faster than in not packed hash of same size.

Initially I experiment with variable sized packed hashes, so that num_bins is used and they goes from tail to avoid division by 3.
With fixed size this could be simplified.

I pushed a commit which places PKEY_POS and PVAL_POS after hashes, but in forward order.

They could be placed altogether (like i*3, i*3+1, i*3+2). remove_packed_entry should be changed accordantly. I think, this could improve iteration over hash.

Actions

Copy link

#10 [ruby-core:42359]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

Table packing slows down st_foreach a bit, and GC suffers from it.
So that I add a st_foreach_nocheck to fix GC https://github.com/funny-falcon/ruby/commit/84d08af5d2943c3dd1a1d0c361fa22c2c7ae5ca4

Actions

Copy link

#11 [ruby-core:42457]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

I've updated table packing patch to correlate with trunk
https://github.com/ruby/ruby/pull/84
https://github.com/ruby/ruby/pull/84.patch

Also there is additional commit which increases usage of st_foreach_nocheck
https://github.com/funny-falcon/ruby/commit/95d6ddcb2b91
https://github.com/funny-falcon/ruby/commit/95d6ddcb2b91.diff

Actions

Copy link

#12 [ruby-core:43591]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

As far this ticket not closed, I'll post hash related patch here:

https://github.com/ruby/ruby/pull/107
https://github.com/ruby/ruby/pull/107.patch

remove some unused code from st.c and hash.c
change rb_hash_modify to rb_hash_modify_check when st_table allocation is not necessary
move part of safe iteration logic to st.c to make it clearer
This is arguable change, cause it clearly do not have positive impact on performance,
but make check consumes 592.2 second before this change and 595.4 after - less than 1 percent,
so that I suppose, difference is negligible.
introduce st_shift to optimize Hash#shift

Actions

Copy link

#13 [ruby-core:43687]

Updated by funny_falcon (Yura Sokolov) over 13 years ago

Add couple of commits to pull request https://github.com/ruby/ruby/pull/107 :

Removal of ST_CHECK .
ST_CHECK were always returned instead of ST_CONTINUE inside of some st_foreach loops.
Now such calls to st_foreach are converted to calls to st_foreach_check.
So that, there is no reason to differentiate ST_CHECK and ST_CONTINUE, which simplifies calling code a bit.
Also, it allows to simplify st_foreach_check code.
Traditionally, ultrapacking for empty and one element tables

Actions

Copy link

#14 [ruby-core:43874]