Bug #20934
open`UnboundMethod#bind_call` may cause "double free or corruption" with Ractor
Description
When I call UnboundMethod#bind_call
on both main Ractor and child Ractor, probable errors can be encountered.
Here is an issue reproduce script ractor_issue.rb
.
def foo
10000.times do
Object.instance_method(:object_id).bind_call(self)
end
end
Ractor.new { foo }
foo
And there are some examples of execution results.
$ ./miniruby -v ractor_issue.rb
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
double free or corruption (fasttop)
Aborted (core dumped)
$ ./miniruby -v ractor_issue.rb
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007fc6c16bfb18 run> terminated with exception (report_on_exception is true):
ractor_issue.rb:3:in 'UnboundMethod#bind_call': undefined method 'object_id' for main (NoMethodError)
from ractor_issue.rb:3:in 'block in Object#foo'
from <internal:numeric>:257:in 'Integer#times'
from ractor_issue.rb:2:in 'Object#foo'
from ractor_issue.rb:8:in '<main>'
Please try running it several times, as there is a probability of successful completion.
Updated by alanwu (Alan Wu) 11 days ago
Nice find! The repro script tips off ASAN too:
../build-dev/ractor-issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=================================================================
==26803==ERROR: AddressSanitizer: heap-use-after-free on address 0x50400001ac50 at pc 0x5d26b2af828b bp 0x7ffcc95d6f00 sp 0x7ffcc95d6ef0
READ of size 1 at 0x50400001ac50 thread T0
#0 0x5d26b2af828a in rb_free_method_entry_vm_weak_references ../vm_method.c:546
#1 0x5d26b261f1c7 in rb_gc_obj_free_vm_weak_references ../gc.c:1204
#2 0x5d26b263dfb6 in gc_sweep_plane ../gc/default/default.c:3545
#3 0x5d26b263dfb6 in gc_sweep_page ../gc/default/default.c:3634
#4 0x5d26b263dfb6 in gc_sweep_step ../gc/default/default.c:3906
#5 0x5d26b2643378 in gc_sweep ../gc/default/default.c:4165
#6 0x5d26b2634115 in gc_start ../gc/default/default.c:6484
#7 0x5d26b264d3d6 in heap_prepare ../gc/default/default.c:2109
#8 0x5d26b264d3d6 in heap_next_free_page ../gc/default/default.c:2315
#9 0x5d26b264d3d6 in newobj_cache_miss ../gc/default/default.c:2423
#10 0x5d26b2651098 in newobj_alloc ../gc/default/default.c:2447
#11 0x5d26b2651098 in rb_gc_impl_new_obj ../gc/default/default.c:2530
#12 0x5d26b2651098 in newobj_of ../gc.c:984
#13 0x5d26b2651098 in rb_wb_protected_newobj_of ../gc.c:1013
#14 0x5d26b2af9b5a in rb_method_entry_alloc ../vm_method.c:776
#15 0x5d26b2af9b5a in rb_method_entry_clone ../vm_method.c:813
#16 0x5d26b2856b69 in convert_umethod_to_method_components ../proc.c:2618
#17 0x5d26b2856b69 in umethod_bind_call ../proc.c:2722
#18 0x5d26b2aea4a7 in vm_call_cfunc_with_frame_ ../vm_insnhelper.c:3801
#19 0x5d26b2aea4a7 in vm_call_cfunc_with_frame ../vm_insnhelper.c:3847
#20 0x5d26b2b4e2f9 in vm_sendish ../vm_insnhelper.c:5968
#21 0x5d26b2b4e2f9 in vm_exec_core ../insns.def:898
#22 0x5d26b2b1bf68 in rb_vm_exec ../vm.c:2586
#23 0x5d26b25d78ce in rb_ec_exec_node ../eval.c:281
#24 0x5d26b25e3688 in ruby_run_node ../eval.c:319
#25 0x5d26b238dc32 in rb_main ../main.c:43
#26 0x5d26b238dc32 in main ../main.c:68
#27 0x786f20a9fe07 (/usr/lib/libc.so.6+0x25e07) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
#28 0x786f20a9fecb in __libc_start_main (/usr/lib/libc.so.6+0x25ecb) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
#29 0x5d26b238f134 in _start (/home/alan/src/ruby/build-asan/miniruby+0x171134) (BuildId: a3fca37e902e0513094844d929f17725e5aef5b1)
0x50400001ac50 is located 0 bytes inside of 48-byte region [0x50400001ac50,0x50400001ac80)
freed by thread T0 here:
#0 0x786f20efc282 in free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8184
#2 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8165
#3 0x5d26b265246f in ruby_sized_xfree ../gc.c:4582
#4 0x5d26b265246f in ruby_xfree ../gc.c:4593
previously allocated by thread T0 here:
#0 0x786f20efd1aa in calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x5d26b265195a in calloc1 ../gc/default/default.c:1472
#2 0x5d26b265195a in rb_gc_impl_calloc ../gc/default/default.c:8220
#3 0x5d26b265195a in ruby_xcalloc_body ../gc.c:4517
#4 0x5d26b265195a in ruby_xcalloc ../gc.c:4511
SUMMARY: AddressSanitizer: heap-use-after-free ../vm_method.c:546 in rb_free_method_entry_vm_weak_references
Shadow bytes around the buggy address:
0x50400001a980: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001aa00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001aa80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ab00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ab80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400001ac00: fa fa 00 00 00 00 00 00 fa fa[fd]fd fd fd fd fd
0x50400001ac80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ad00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ad80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ae00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
0x50400001ae80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==26803==ABORTING
It's use-after-free with a rb_method_definition_t
. Judging solely by the trace, the underlying issue might not be ractor specific.
Updated by wanabe (_ wanabe) 11 days ago
Thanks for your comment. It looks like a race condition between rb_method_definition_release()
and method_definition_addref()
.
Applying the following patch suppressed the problem.
diff --git a/vm_method.c b/vm_method.c
index 69a533b182..725b2a99ac 100644
--- a/vm_method.c
+++ b/vm_method.c
@@ -518,6 +518,7 @@ static void
rb_method_definition_release(rb_method_definition_t *def)
{
if (def != NULL) {
+ RB_VM_LOCK_ENTER();
const int reference_count = def->reference_count;
def->reference_count--;
@@ -535,6 +536,7 @@ rb_method_definition_release(rb_method_definition_t *def)
if (METHOD_DEBUG) fprintf(stderr, "-%p-%s:%d->%d (dec)\n", (void *)def, rb_id2name(def->original_id),
reference_count, def->reference_count);
}
+ RB_VM_LOCK_LEAVE();
}
}
@@ -621,9 +623,11 @@ setup_method_cfunc_struct(rb_method_cfunc_t *cfunc, VALUE (*func)(ANYARGS), int
static rb_method_definition_t *
method_definition_addref(rb_method_definition_t *def, bool complemented)
{
+ RB_VM_LOCK_ENTER();
if (!complemented && def->reference_count > 0) def->aliased = true;
def->reference_count++;
if (METHOD_DEBUG) fprintf(stderr, "+%p-%s:%d\n", (void *)def, rb_id2name(def->original_id), def->reference_count);
+ RB_VM_LOCK_LEAVE();
return def;
}
But I don't think it is a good idea because the cost would be high.
Hopefully there is a better solution.