Bug #21993
open`rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys
Description
Hey!
I work for Datadog on the Ruby profiler and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions.
While reading through the API, I noticed a mismatch between the documented pairing of rb_mark_tbl_no_pin + rb_gc_update_tbl_refs and what the implementation actually does that seems like it can cause crashes and even corruption for extensions.
The problem¶
The public header documents rb_gc_update_tbl_refs as the dcompact counterpart for rb_mark_tbl_no_pin:
* Updates references inside of tables. After you marked values using
* rb_mark_tbl_no_pin(), the objects inside of the table could of course be
* moved. This function is to fixup those references.
However, rb_mark_tbl_no_pin's implementation only marks values, not keys:
// gc.c / gc/gc.h
static int
gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data)
{
rb_gc_mark_movable((VALUE)value); // key is never touched
return ST_CONTINUE;
}
Thus, tables used with rb_mark_tbl_no_pin can have non-VALUE keys (e.g. IDs, raw integers, C pointers, etc). The problem is that rb_gc_update_tbl_refs (via gc_update_table_refs) calls rb_gc_location on both keys and values:
static int
hash_foreach_replace(st_data_t key, st_data_t value, ...) {
if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key
return ST_REPLACE;
if (rb_gc_location((VALUE)value) != (VALUE)value)
return ST_REPLACE;
...
}
Calling rb_gc_location on something that's not a VALUE and even sometimes replacing it is probably going to cause crashes and heap corruption.
Suggested fixes¶
I can think of a few ways to address this:
a) Deprecate rb_gc_update_tbl_refs
Maybe just deprecate it? I'm not even sure if this gets used very often.
b) Introduce rb_mark_hash_no_pin
The mark_keyvalue iterator already exists internally and is used by mark_hash() for Ruby's own Hash objects. Exposing a public rb_mark_hash_no_pin would give rb_gc_update_tbl_refs a safe, correct dmark counterpart for tables where both keys and values are Ruby objects.
c) Fix the docs and document the precondition for rb_gc_update_tbl_refs
Update the docs to make explicit that rb_gc_update_tbl_refs assumes a VALUE => VALUE table, and that all keys must have been marked (e.g. via rb_mark_set or a manual iteration with rb_gc_mark_movable) in addition to calling rb_mark_tbl_no_pin for the values. Weird and ugly, but correct?
Happy to help with a patch if useful.
Updated by luke-gru (Luke Gruber) 23 days ago
· Edited
Nice catch! I think this would only be a problem if the key was a number that was also a valid pointer into the Ruby heap and it happened to point to a T_MOVED object. If it's another pointer from malloc or xmalloc it shouldn't be an issue, but it is still sketchy.
Updated by luke-gru (Luke Gruber) 23 days ago
· Edited
Looking into this further, it looks like gc_update_table_refs is only used once internally, with the finalizer_table. It looks to me like it's used wrong there because the key (finalized object) is not pinned so can be moved, but the st_table is not rehashed afterwards.
Updated by ivoanjo (Ivo Anjo) 22 days ago
I guess objects that get a finalizer don't move -- e.g. gc_is_moveable_obj ==> FALSE so that's what made the finalizer table never be affected by this detail.
Updated by luke-gru (Luke Gruber) 22 days ago
Oh yeah, thanks. There's even a helpful comment for it. I think changing rb_gc_update_tbl_refs to only move values probably makes sense. We could add something like rb_mark_hash_nopin_values where it marks values for moving but pins keys. They could be the mark/compact pair for VALUE=>VALUE st_tables. If the user wants to support moving of a st_table key, they would need to write it themselves.
Updated by ivoanjo (Ivo Anjo) 22 days ago
Yeah I think that'd be fine! To be honest I never ever knew these APIs existed, I've been researching the GC and that's how I found it, so I suspect they might not be very common?