Feature #22008
openRUBY_INTERNAL_EVENT_NEWOBJ should run earlier, with fully uninitialized object
Description
RUBY_INTERNAL_EVENT_NEWOBJ is an internal tracepoint event, accessible only to C. I'd consider it an unstable semi-private API, really intended to be used by only ObjectSpace. The documentation states:
- in internal events, you can not use any Ruby APIs (even object creations)
- Limitations are MRI version specific
Basically, it's unsafe to do use any Ruby APIs in the hook. An exception is rb_profile_frames, but I believe everything else should not be allowed.
Currently the RUBY_INTERNAL_EVENT_NEWOBJ hook fires after the object has been assigned its klass, flags, and (sometimes) shape_id. In pseudocode the current newobj_of is:
VALUE rb_gc_impl_new_obj(..., klass, flags, ...) {
obj = freelist_pop_or_alloc(size);
obj->flags = flags;
obj->klass = klass;
return obj;
}
VALUE newobj_of(klass, flags, shape_id, ...) {
VALUE obj = rb_gc_impl_new_obj(..., klass, flags, ...);
obj->shape_id = shape_id;
if (rb_gc_event_hook_required_p(RUBY_INTERNAL_EVENT_NEWOBJ)) {
// hook receives partially initialized object (klass, flags, shape_id, but no other fields)
gc_newobj_hook(obj);
}
return obj;
}
Instead I would like this to look like the following:
VALUE rb_gc_impl_new_obj(...) {
VALUE obj = freelist_pop_or_alloc(size);
if (rb_gc_event_hook_required_p(RUBY_INTERNAL_EVENT_NEWOBJ)) {
// hook receives uninitialized object, a fully 0-initialized T_NONE
gc_newobj_hook(obj);
}
return obj;
}
VALUE newobj_of(klass, flags, shape_id, ...) {
VALUE obj = rb_gc_impl_new_obj(...);
obj->flags = flags;
obj->shape_id = shape_id;
obj->klass = klass;
return obj;
}
Calling the hook with a partially constructed object (klass/shape/flags, but no other attributes set) forces us to include klass/flag/shape_id assignment in the GC API, which prevents several optimizations I'd like to attempt in the next year or so:
- Inlining the klass/flags/shape assignment into caller of NEWOBJ macro
- Inline allocations in ZJIT (@tekknolagi (Maxwell Bernstein) has asked for the GC to be able to support this)
- Eliding reads and modifications to flags/klass/shape, both in C code and ZJIT (should happen automatically from inlining above)
- Low-overhead sampling by keeping a freelist or bump pointer of known size https://pypy.org/posts/2025/02/pypy-gc-sampling.html#sampling-approach
I want to make this change, however it may cause issues with some allocation profiling gems:
- ko1's
allocation_tracerreads the value_type and klass when called - ddtrace (?) previously tried to use object_id on freshly allocated objects (which was always buggy, crashes reliably in Ruby 4.0), not sure what it does now #21710
- ruby-prof checks if newly allocated objects are IMEMOs (Some previous discussion in #21854). I'm not sure why, it will probably work with this change and simply not skip imemo.
Stackprof, Vernier (my profiler), and ObjectSpace.trace_allocations will not have issues as they do not attempt to access the memory of these objects, they only use the address.
I'd like to discuss how to make this change. Possible options:
- Make the change, gems that try to access flags/klass may crash
- Silently disable RUBY_INTERNAL_EVENT_NEWOBJ and introduce a new event or API. This would avoid any crashes, but existing gems would not work
- Skip the C optimizations. Disable ZJIT (with a warning?) when RUBY_INTERNAL_EVENT_NEWOBJ is used
- Add a
rb_gc_post_alloc(obj)call to all call sites after the assignment (slow)
No data to display