Project

General

Profile

Actions

Bug #20934

open

`UnboundMethod#bind_call` may cause "double free or corruption" with Ractor

Added by wanabe (_ wanabe) about 1 month ago. Updated 25 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
[ruby-core:120128]

Description

When I call UnboundMethod#bind_call on both main Ractor and child Ractor, probable errors can be encountered.
Here is an issue reproduce script ractor_issue.rb.

def foo
  10000.times do
    Object.instance_method(:object_id).bind_call(self)
  end
end

Ractor.new { foo }
foo

And there are some examples of execution results.

$ ./miniruby -v ractor_issue.rb 
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
double free or corruption (fasttop)
Aborted (core dumped)
$ ./miniruby -v ractor_issue.rb 
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007fc6c16bfb18 run> terminated with exception (report_on_exception is true):
ractor_issue.rb:3:in 'UnboundMethod#bind_call': undefined method 'object_id' for main (NoMethodError)
        from ractor_issue.rb:3:in 'block in Object#foo'
        from <internal:numeric>:257:in 'Integer#times'
        from ractor_issue.rb:2:in 'Object#foo'
        from ractor_issue.rb:8:in '<main>'

Please try running it several times, as there is a probability of successful completion.

Updated by alanwu (Alan Wu) about 1 month ago

Nice find! The repro script tips off ASAN too:

../build-dev/ractor-issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=================================================================
==26803==ERROR: AddressSanitizer: heap-use-after-free on address 0x50400001ac50 at pc 0x5d26b2af828b bp 0x7ffcc95d6f00 sp 0x7ffcc95d6ef0
READ of size 1 at 0x50400001ac50 thread T0
    #0 0x5d26b2af828a in rb_free_method_entry_vm_weak_references ../vm_method.c:546
    #1 0x5d26b261f1c7 in rb_gc_obj_free_vm_weak_references ../gc.c:1204
    #2 0x5d26b263dfb6 in gc_sweep_plane ../gc/default/default.c:3545
    #3 0x5d26b263dfb6 in gc_sweep_page ../gc/default/default.c:3634
    #4 0x5d26b263dfb6 in gc_sweep_step ../gc/default/default.c:3906
    #5 0x5d26b2643378 in gc_sweep ../gc/default/default.c:4165
    #6 0x5d26b2634115 in gc_start ../gc/default/default.c:6484
    #7 0x5d26b264d3d6 in heap_prepare ../gc/default/default.c:2109
    #8 0x5d26b264d3d6 in heap_next_free_page ../gc/default/default.c:2315
    #9 0x5d26b264d3d6 in newobj_cache_miss ../gc/default/default.c:2423
    #10 0x5d26b2651098 in newobj_alloc ../gc/default/default.c:2447
    #11 0x5d26b2651098 in rb_gc_impl_new_obj ../gc/default/default.c:2530
    #12 0x5d26b2651098 in newobj_of ../gc.c:984
    #13 0x5d26b2651098 in rb_wb_protected_newobj_of ../gc.c:1013
    #14 0x5d26b2af9b5a in rb_method_entry_alloc ../vm_method.c:776
    #15 0x5d26b2af9b5a in rb_method_entry_clone ../vm_method.c:813
    #16 0x5d26b2856b69 in convert_umethod_to_method_components ../proc.c:2618
    #17 0x5d26b2856b69 in umethod_bind_call ../proc.c:2722
    #18 0x5d26b2aea4a7 in vm_call_cfunc_with_frame_ ../vm_insnhelper.c:3801
    #19 0x5d26b2aea4a7 in vm_call_cfunc_with_frame ../vm_insnhelper.c:3847
    #20 0x5d26b2b4e2f9 in vm_sendish ../vm_insnhelper.c:5968
    #21 0x5d26b2b4e2f9 in vm_exec_core ../insns.def:898
    #22 0x5d26b2b1bf68 in rb_vm_exec ../vm.c:2586
    #23 0x5d26b25d78ce in rb_ec_exec_node ../eval.c:281
    #24 0x5d26b25e3688 in ruby_run_node ../eval.c:319
    #25 0x5d26b238dc32 in rb_main ../main.c:43
    #26 0x5d26b238dc32 in main ../main.c:68
    #27 0x786f20a9fe07  (/usr/lib/libc.so.6+0x25e07) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
    #28 0x786f20a9fecb in __libc_start_main (/usr/lib/libc.so.6+0x25ecb) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
    #29 0x5d26b238f134 in _start (/home/alan/src/ruby/build-asan/miniruby+0x171134) (BuildId: a3fca37e902e0513094844d929f17725e5aef5b1)

0x50400001ac50 is located 0 bytes inside of 48-byte region [0x50400001ac50,0x50400001ac80)
freed by thread T0 here:
    #0 0x786f20efc282 in free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
    #1 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8184
    #2 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8165
    #3 0x5d26b265246f in ruby_sized_xfree ../gc.c:4582
    #4 0x5d26b265246f in ruby_xfree ../gc.c:4593

previously allocated by thread T0 here:
    #0 0x786f20efd1aa in calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x5d26b265195a in calloc1 ../gc/default/default.c:1472
    #2 0x5d26b265195a in rb_gc_impl_calloc ../gc/default/default.c:8220
    #3 0x5d26b265195a in ruby_xcalloc_body ../gc.c:4517
    #4 0x5d26b265195a in ruby_xcalloc ../gc.c:4511

SUMMARY: AddressSanitizer: heap-use-after-free ../vm_method.c:546 in rb_free_method_entry_vm_weak_references
Shadow bytes around the buggy address:
  0x50400001a980: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001aa00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001aa80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ab00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ab80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400001ac00: fa fa 00 00 00 00 00 00 fa fa[fd]fd fd fd fd fd
  0x50400001ac80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ad00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ad80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ae00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ae80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==26803==ABORTING

It's use-after-free with a rb_method_definition_t. Judging solely by the trace, the underlying issue might not be ractor specific.

Updated by wanabe (_ wanabe) about 1 month ago

Thanks for your comment. It looks like a race condition between rb_method_definition_release() and method_definition_addref().
Applying the following patch suppressed the problem.

diff --git a/vm_method.c b/vm_method.c
index 69a533b182..725b2a99ac 100644
--- a/vm_method.c
+++ b/vm_method.c
@@ -518,6 +518,7 @@ static void
 rb_method_definition_release(rb_method_definition_t *def)
 {
     if (def != NULL) {
+        RB_VM_LOCK_ENTER();
         const int reference_count = def->reference_count;
         def->reference_count--;
 
@@ -535,6 +536,7 @@ rb_method_definition_release(rb_method_definition_t *def)
             if (METHOD_DEBUG) fprintf(stderr, "-%p-%s:%d->%d (dec)\n", (void *)def, rb_id2name(def->original_id),
                                       reference_count, def->reference_count);
         }
+        RB_VM_LOCK_LEAVE();
     }
 }
 
@@ -621,9 +623,11 @@ setup_method_cfunc_struct(rb_method_cfunc_t *cfunc, VALUE (*func)(ANYARGS), int
 static rb_method_definition_t *
 method_definition_addref(rb_method_definition_t *def, bool complemented)
 {
+    RB_VM_LOCK_ENTER();
     if (!complemented &&  def->reference_count > 0) def->aliased = true;
     def->reference_count++;
     if (METHOD_DEBUG) fprintf(stderr, "+%p-%s:%d\n", (void *)def, rb_id2name(def->original_id), def->reference_count);
+    RB_VM_LOCK_LEAVE();
     return def;
 }
 

But I don't think it is a good idea because the cost would be high.
Hopefully there is a better solution.

Actions

Also available in: Atom PDF

Like0
Like0Like1