Feature #9614

ordering of non-Hash items which use st_ internally

Added by Eric Wong about 1 year ago. Updated 7 months ago.

[ruby-core:61383]
Status:Open
Priority:Normal
Assignee:Yukihiro Matsumoto

Description

Hi matz, I would like your permission to remove the order preservation from
any or all of the following currently implemented using st_table:

  • method tables
  • global symbols (Symbol.all_symbols)
  • constant tables
  • instance variable tables
  • global_variables method
  • Thread#keys
  • anything besides the Hash class

I am currently working on a patch series to reduce internal memory usage,
so far I have only converted three pieces:

  1. method tables (~200K reduction)
  2. symbol table (global_symbols.{id_str,sym_id}) (~200K)
  3. frozen_strings (~100K)

n.b. frozen_strings ordering is never exposed to users, so I expect
it to be OK.

Memory reduction is just based on "ruby -e exit" (which loads RubyGems);
bigger programs with more methods/symbols will save more memory.

Work-in-progress patches attached (0002 describes implementation details)

0001-adjust-tests-to-account-for-unsorted-methods.patch Magnifier (1.87 KB) Eric Wong, 03/09/2014 02:22 AM

0002-ihash-initial-implementation-method-table-conversion.patch Magnifier - commit message here has more info (32.8 KB) Eric Wong, 03/09/2014 02:22 AM

0004-ihash-implement-rb_ihash_update-to-support-rb_fstrin.patch Magnifier (7.84 KB) Eric Wong, 03/09/2014 02:22 AM

0003-parse.y-switch-to-ihash-saves-200K-out-of-the-box.patch Magnifier (9.51 KB) Eric Wong, 03/09/2014 02:22 AM

History

#1 Updated by Eric Wong 11 months ago

normalperson@yhbt.net wrote:

Hi matz, I would like your permission to remove the order preservation from
any or all of the following currently implemented using st_table:

  • method tables
  • global symbols (Symbol.all_symbols)
  • constant tables
  • instance variable tables
  • global_variables method
  • Thread#keys
  • anything besides the Hash class

matz: ping?

#2 Updated by Koichi Sasada 11 months ago

Just a comment.

I think it should be another parameter of st_table.

Now all st_tables have order.
But most of case, it doesn't needed.

I think it is fine for me that st_table user can select.

#3 Updated by Nobuyoshi Nakada 11 months ago

  • Description updated (diff)

How is it the rate of reduction?

Since I wasn't very positive to make the ordering a spec, I'm not against it.
But it doesn't feel nice to have many similar mechanisms, too.

#4 Updated by Nobuyoshi Nakada 11 months ago

I suspect that tests for local_variables would need fixing too.

#5 Updated by Eric Wong 11 months ago

Hi, thank you for the comments!

ko1: adding flag for ordering might complicate the st code even more.
I think st should only be for implementing Hash.

My primary goals for ihash are to reduce pointer chasing and malloc use.
ihash uses container_of, like ccan/list:

st (unpacked entries):
st_table -> st_table->bins -> st_table_entry -> data structure

st (packed entries):
st_table -> st_table->packed.entries -> data structure

ihash:
rb_ihash_table -> data structure (via container_of)

Removing ordering requirment allows us to avoid writing/maintaining more
(new) code and also save two pointers (fore/prev).

nobu: reduction depends on what is stored

Best case so far are method table and symbol table:

 For each method entry, we save:

     sizeof(st_table_entry) - sizeof(rb_ihash_node) + malloc_overhead

 On 64-bit with glibc malloc: 48 - 8 + 16 = 56 bytes saved.

 AFAIK, jemalloc has less overhead for small allocations, but even
 without malloc overhead, saving 40 bytes per method entry is great.

Symbol table is a big win, too, one single struct has membership
in both sym_id and id_str tables:

 struct rb_idsym {
     struct rb_ihash_node id_str_node; /* sizeof(void *) */
     ID id;
     struct rb_ihash_node sym_id_node;
     VALUE symstr;
     st_index_t hashval;
 };

 On 64-bit:
 Before: 48 + 48 = 96  (+ 32 bytes on glibc malloc)
 After:  40            (+ 16 bytes on glibc malloc)

 If we add ordering to Symbol.all_symbols, we only need one pair
 of list pointers and not two.  But I prefer to not need any :)

Worst case is only saving two pointers (ivar table); but I may look at
funny-falcon's sparse array for the ivar table. I have not looked at
the local variable table, but it may be in the same class as ivar
tables and be better with a sparse array.

P.S.: I am also considering truncating rb_idsym.hashval to use as ID.
This will:
a) reduce per-symbol overhead by sizeof(st_index_t)
b) (hopefully) improve distribution for global method cach
Measuring the effect of b) may be hard, though.

#6 Updated by Koichi Sasada 11 months ago

ko1: adding flag for ordering might complicate the st code even more.
I think st should only be for implementing Hash.

I agree that st.c is already complicated.
But I think Hash class should use original one.

Adding similar data structure can also increase complexity.
At least, I like same interfaces for such similar data structures.

My primary goals for ihash are to reduce pointer chasing and malloc use.
ihash uses container_of, like ccan/list:

It makes sense.

I think open addressing is better for such internal, small tables such as method table.
It doesn't need chain buckets.

So I wanted to extend st to introduce open addressing mode.

#7 Updated by Eric Wong 11 months ago

ko1@atdot.net wrote:

Adding similar data structure can also increase complexity.
At least, I like same interfaces for such similar data structures.

Right, I try to make ihash API like st (insert/update/foreach/lookup).
However, there must be some differences due to data layout changes.

e.g. delete could be implemented without a callback, so I call the new
operation "unlink" instead. Maybe "delink" is a less ambiguous name...

I think open addressing is better for such internal, small tables such as method table.
It doesn't need chain buckets.

So I wanted to extend st to introduce open addressing mode.

Open addressing will probably work better for instance/local var tables
with simple ID->(VALUE|long|ID) mapping (no struct behind the data).

For larger structures like rb_method_entry/rb_const_entry, I think using
container_of-based storage is best. Open addressing may not make sense
when using container_of, chaining was simpler and easier to match what
st already does, so I chose chaining for ihash.

#8 Updated by Eric Wong 7 months ago

I forget, ordering is easy to add to ihash with ccan/list.

Work-in-progress, this is only for method entries with ihash ordered
doing "ruby -e exit" (loading rubygems)

Numbers from valgrind, so "bytes allocated" does not take into account malloc
overhead of particular malloc implementations:

st (original):
total heap usage: 48,119 allocs, 19,169 frees, 8,106,443 bytes allocated

ihash-unordered
total heap usage: 45,571 allocs, 19,501 frees, 8,038,885 bytes allocated

ihash-ordered:
total heap usage: 45,571 allocs, 19,501 frees, 8,089,941 bytes allocated

The reduction in overall malloc/free calls is nice; but bytes allocated
is small because we're dealing with small elements. This makes method
entries much bigger (1 pointer for hash chaining, 2 pointers for
list_node), but reduces the need to make separate allocations for
st_table_entry.

  1. http://bogomips.org/ruby.git/patch/?id=3930d8172dea41cd ihash: initial implementation
  2. http://bogomips.org/ruby.git/patch/?id=02334f26a0c2a1f5 convert method entries to unordered ihash
  3. http://bogomips.org/ruby.git/patch/?id=fcf8e764bcd5a1c1 preserve ordering in method entries

git clone git://bogomips.org/ruby.git branch=ihash7

(I also started working on constants, but haven't added ordering, yet).

Running benchmark suite now, I expect uncached method entries for larger
classes/modules should be faster due to reduced indirection as described
in

For small (currently <=5) element tables, it is like st-packed and does a
linear search (which will cross cache lines, unfortunately)

Also available in: Atom PDF