Feature #22011
openHash tables with swiss table
Description
This change adds a Swiss-table-inspired probing layer to Ruby's core st_table, and shrinks st_table_entry from 24 B to 16 B by moving the stored hash into a parallel array. It is built and enabled by default; --disable-swiss-st reverts to the original st.c. The public ABI of struct st_table and the iteration-order guarantee are preserved.
Motivation¶
Hashes are everywhere in Ruby — instance-variable tables, ivar shapes, constant tables, JSON/HTTP/AR rows, every params, every to_h. Profiles of Rails-shaped workloads spend a meaningful fraction of CPU inside st_lookup and st_insert. Two pain points stood out in the upstream implementation:
-
Probe loops are branch-heavy. Every step of the perturb chain loads a bin, fetches the entry it points to, compares the full
st_hash_t(8 B) and only then callseql?. On a miss that is several dependent loads per probe with no way to fast-reject groups of slots in parallel. -
st_table_entryis 24 B. The(hash, key, record)triple gets one cache line per ~2.5 entries. Iteration and equality scans burn through L1 quickly, and Ruby programs typically hold a lot of small-to-mid-sized hashes (so per-table overhead matters).
The Swiss-table family of designs (Abseil flat_hash_map, Rust hashbrown) addresses (1) with a 1-byte-per-slot control array that lets a single SIMD/SWAR comparison reject or short-list 8 slots at once. We borrow that idea but keep Ruby's two-array layout (so we don't break ABI or insertion order) and add a third, parallel ctrl[] byte array. We then attack (2) by also extracting the hash field out of st_table_entry into its own parallel uint32_t hashes[] array.
Changes¶
- Adds
ST_USE_SWISS_BINS, enabled underRUBY_EXPORT, disabled otherwise soparser_st.ckeeps the old-compatible layout. - Removes the stored hash field from st_table_entry in the Swiss path and stores hashes in a side array after entries, using 32-bit stored hashes.
- Introduces packed Swiss bin groups: 8 control bytes plus 8 bin indexes per group.
- Adds Swiss probing via control-byte matching, using 7-bit h2 fingerprints for candidate filtering.
- Packs bins after entries when possible, reducing separate allocation pressure.
- Updates allocation, copy, free, memsize, rebuild, rehash, insert, delete, shift, lookup, foreach, keys, and values paths to go through hash/bin abstraction macros.
- Adjusts rebuild thresholds for Swiss load factor, using roughly 7/8 bin occupancy.
- Adds tombstone-triggered rebuild behavior when many deleted slots accumulate.
Analysis¶
Earlier preliminarily analysis shows HUGE improvements on microbenchmarks after adopting ideas from swiss tables, but it is not quite true in the real-world benches.
Swiss-table-style probing can look huge in isolated lookup/insert microbenchmarks because the hot loop is tight, the table stays cache-hot, and control-byte filtering avoids many full entry/key comparisons.
In real Ruby workloads, the hash table operations are mixed with object allocation, method dispatch, GC barriers, string/hash callbacks, comparisons, branchy VM work, etc. That weakens the “everything is in L1” assumption.
If the old table is kept at < 0.5 load factor, collision chains/probe lengths are already short, so a simpler open-addressing scheme can be very competitive because it has smaller code and fewer moving parts.
The larger practical win of the Swiss-ish design is probably memory density: sustaining a much higher load factor, around 7/8, without making probe behavior collapse. With this new method, we could increase the load factor to about 7/8 with competitive speed. This saves us more memory then the introduced ctrl[].
Files






