Feature #11420

Introduce ID key table into MRI

Added by ko1 (Koichi Sasada) over 5 years ago. Updated over 5 years ago.

Target version:


Let's introduce ID key table to optimization.


Now, most of tables are implemented by st_table.
st_table is O(1) hash data structure.

MRI uses ID keys tables for many purpose like:

  • method tables (mtbl) for classes/modules
  • constant table for classes/modules
  • attribute (index) tables for classes/modules/... and so on.

Generally st_table is reasonable performance (speed/memory consumption), but
st_table has several drawbacks for ID key table.

  • Need two words per entry for ordering (st_table_entry::olist). (memory)
  • Need hash value (st_table_entry::hash). (memory)
  • Need to call a function to compare IDs. (speed)

I think introducing ID key table data structure is reasonable.

Already, Yura Sokolov proposed sa_table #6962 for this purpose.
However, we can't introduce it because of several reasons.

(1) expose with st table (public API)
(2) sa_table assume ID as 32bit integer.

I need to explain (2) more. There were two reasons.

Biggest issue was Symbol GC. We planed to introduce Symbol GC.
At that time, we didn't decide we use sequential number for IDs created from dynamic symbols.
Actually, the first implementation of Symbol GC uses corresponding Symbol's object_id to represent ID.

Another issue was the available ID numbers.
ID has several fields (3 bits), so we can make 232-3 IDs == 536,870,912 IDs.
We need at least (8+40)B per ID, so 536,870,912 IDs consumes 171,798,691,840 = 171GB.
It is huge, but we can't decide it is enough for future.

However, now the above issues are solved.

All IDs has sequential number from 1 (0 is reserved as undefined).
So that the current bit pattern of ID is [sequential number (64-3 or 32-3 bits)] [flags (3 bits)].

We can convert ID to [sequential number] and [sequential number] to ID.
So ID key table only keep [sequential number] as a key, instead of ID itself.
With this technique, we can keep 232 IDs.
Maybe it is enough because 232 IDs need at least 1,374,389,534,720 = 1.3TB.
(or not enough in a few years later?)

Anyway, using this technique, we can keep more IDs in 32bit integer.


I want to introduce "struct rb_id_table" data structure for MRI interpreter private data structure.

id_table.h declare only common functions.

Each functions can be implemented any algorithms.

I wrote several variations.

(1) Using st (same as current implementation)
(2) Using simple array
(3) Using simple array with sequential number of ID instead of ID itself. Linear search for look up.
(4) Using simple array with sequential number of ID instead of ID itself, and sorted. Binary search for look up.
(5) Using funny falcon's Coalesced Hashing implementation [Feature #6962]

Generally, array implementation seems bad idea, but my implementation for (3), (4) uses dense array of 32bit integers, so that cache locality is nice when it is enough small table (16 entries == 64B).

Here is a small benchmark to measure memory consumption:

require 'objspace'

def make_class method_num{

C = make_class(10_000)
p ObjectSpace.memsize_of(C)

 # of methods   |     10K     20K
(1) st          : 496,560 992,944
(2) list(ID key): 262,296 524,440
(4) list(sorted): 196,760 393,368
(5) CHashing    : 262,296 524,440

I found that most of method tables are 0 or a few entries.

# of methods

This is a survey of method number on simple rails app (development mode).
X axis shows method number and Y axis shows the number of classes/modules which have method number == X-axis.

This picture shows that 3500 classes/modules has 0 methods.
Only a few classes/modules has over 100 methods.

So combination with array (for small number of entries) and hash (for larger number of entries) will be nice.

Now I replace type of m_tbl to this struct.
See all of implementation.


We can compete algorithms :)


method_number.png (18.2 KB) method_number.png ko1 (Koichi Sasada), 08/06/2015 08:39 AM
measurement2.png (29.4 KB) measurement2.png ko1 (Koichi Sasada), 08/11/2015 01:11 PM
microbenchmark.pdf (124 KB) microbenchmark.pdf ko1 (Koichi Sasada), 08/11/2015 08:11 PM

Related issues

Related to Ruby master - Feature #6962: Use lighter hash structure for methods table, instance variable positions, constantsClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Feature #11414: Relax ID table orderingClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Feature #9638: [PATCH] limit IDs to 32-bits on 64-bit systemsRejected03/14/2014Actions

Also available in: Atom PDF