Project

General

Profile

Actions

Bug #20934

closed

`UnboundMethod#bind_call` may cause "double free or corruption" with Ractor

Added by wanabe (_ wanabe) 4 months ago. Updated 10 days ago.

Status:
Closed
Assignee:
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
[ruby-core:120128]

Description

When I call UnboundMethod#bind_call on both main Ractor and child Ractor, probable errors can be encountered.
Here is an issue reproduce script ractor_issue.rb.

def foo
  10000.times do
    Object.instance_method(:object_id).bind_call(self)
  end
end

Ractor.new { foo }
foo

And there are some examples of execution results.

$ ./miniruby -v ractor_issue.rb 
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
double free or corruption (fasttop)
Aborted (core dumped)
$ ./miniruby -v ractor_issue.rb 
ruby 3.4.0dev (2024-12-06T18:51:08Z :detached: 8ad6860ff7) +PRISM [x86_64-linux]
ractor_issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007fc6c16bfb18 run> terminated with exception (report_on_exception is true):
ractor_issue.rb:3:in 'UnboundMethod#bind_call': undefined method 'object_id' for main (NoMethodError)
        from ractor_issue.rb:3:in 'block in Object#foo'
        from <internal:numeric>:257:in 'Integer#times'
        from ractor_issue.rb:2:in 'Object#foo'
        from ractor_issue.rb:8:in '<main>'

Please try running it several times, as there is a probability of successful completion.

Updated by alanwu (Alan Wu) 4 months ago

Nice find! The repro script tips off ASAN too:

../build-dev/ractor-issue.rb:7: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=================================================================
==26803==ERROR: AddressSanitizer: heap-use-after-free on address 0x50400001ac50 at pc 0x5d26b2af828b bp 0x7ffcc95d6f00 sp 0x7ffcc95d6ef0
READ of size 1 at 0x50400001ac50 thread T0
    #0 0x5d26b2af828a in rb_free_method_entry_vm_weak_references ../vm_method.c:546
    #1 0x5d26b261f1c7 in rb_gc_obj_free_vm_weak_references ../gc.c:1204
    #2 0x5d26b263dfb6 in gc_sweep_plane ../gc/default/default.c:3545
    #3 0x5d26b263dfb6 in gc_sweep_page ../gc/default/default.c:3634
    #4 0x5d26b263dfb6 in gc_sweep_step ../gc/default/default.c:3906
    #5 0x5d26b2643378 in gc_sweep ../gc/default/default.c:4165
    #6 0x5d26b2634115 in gc_start ../gc/default/default.c:6484
    #7 0x5d26b264d3d6 in heap_prepare ../gc/default/default.c:2109
    #8 0x5d26b264d3d6 in heap_next_free_page ../gc/default/default.c:2315
    #9 0x5d26b264d3d6 in newobj_cache_miss ../gc/default/default.c:2423
    #10 0x5d26b2651098 in newobj_alloc ../gc/default/default.c:2447
    #11 0x5d26b2651098 in rb_gc_impl_new_obj ../gc/default/default.c:2530
    #12 0x5d26b2651098 in newobj_of ../gc.c:984
    #13 0x5d26b2651098 in rb_wb_protected_newobj_of ../gc.c:1013
    #14 0x5d26b2af9b5a in rb_method_entry_alloc ../vm_method.c:776
    #15 0x5d26b2af9b5a in rb_method_entry_clone ../vm_method.c:813
    #16 0x5d26b2856b69 in convert_umethod_to_method_components ../proc.c:2618
    #17 0x5d26b2856b69 in umethod_bind_call ../proc.c:2722
    #18 0x5d26b2aea4a7 in vm_call_cfunc_with_frame_ ../vm_insnhelper.c:3801
    #19 0x5d26b2aea4a7 in vm_call_cfunc_with_frame ../vm_insnhelper.c:3847
    #20 0x5d26b2b4e2f9 in vm_sendish ../vm_insnhelper.c:5968
    #21 0x5d26b2b4e2f9 in vm_exec_core ../insns.def:898
    #22 0x5d26b2b1bf68 in rb_vm_exec ../vm.c:2586
    #23 0x5d26b25d78ce in rb_ec_exec_node ../eval.c:281
    #24 0x5d26b25e3688 in ruby_run_node ../eval.c:319
    #25 0x5d26b238dc32 in rb_main ../main.c:43
    #26 0x5d26b238dc32 in main ../main.c:68
    #27 0x786f20a9fe07  (/usr/lib/libc.so.6+0x25e07) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
    #28 0x786f20a9fecb in __libc_start_main (/usr/lib/libc.so.6+0x25ecb) (BuildId: 98b3d8e0b8c534c769cb871c438b4f8f3a8e4bf3)
    #29 0x5d26b238f134 in _start (/home/alan/src/ruby/build-asan/miniruby+0x171134) (BuildId: a3fca37e902e0513094844d929f17725e5aef5b1)

0x50400001ac50 is located 0 bytes inside of 48-byte region [0x50400001ac50,0x50400001ac80)
freed by thread T0 here:
    #0 0x786f20efc282 in free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
    #1 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8184
    #2 0x5d26b265246f in rb_gc_impl_free ../gc/default/default.c:8165
    #3 0x5d26b265246f in ruby_sized_xfree ../gc.c:4582
    #4 0x5d26b265246f in ruby_xfree ../gc.c:4593

previously allocated by thread T0 here:
    #0 0x786f20efd1aa in calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x5d26b265195a in calloc1 ../gc/default/default.c:1472
    #2 0x5d26b265195a in rb_gc_impl_calloc ../gc/default/default.c:8220
    #3 0x5d26b265195a in ruby_xcalloc_body ../gc.c:4517
    #4 0x5d26b265195a in ruby_xcalloc ../gc.c:4511

SUMMARY: AddressSanitizer: heap-use-after-free ../vm_method.c:546 in rb_free_method_entry_vm_weak_references
Shadow bytes around the buggy address:
  0x50400001a980: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001aa00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001aa80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ab00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ab80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400001ac00: fa fa 00 00 00 00 00 00 fa fa[fd]fd fd fd fd fd
  0x50400001ac80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ad00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ad80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ae00: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
  0x50400001ae80: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==26803==ABORTING

It's use-after-free with a rb_method_definition_t. Judging solely by the trace, the underlying issue might not be ractor specific.

Updated by wanabe (_ wanabe) 4 months ago

Thanks for your comment. It looks like a race condition between rb_method_definition_release() and method_definition_addref().
Applying the following patch suppressed the problem.

diff --git a/vm_method.c b/vm_method.c
index 69a533b182..725b2a99ac 100644
--- a/vm_method.c
+++ b/vm_method.c
@@ -518,6 +518,7 @@ static void
 rb_method_definition_release(rb_method_definition_t *def)
 {
     if (def != NULL) {
+        RB_VM_LOCK_ENTER();
         const int reference_count = def->reference_count;
         def->reference_count--;
 
@@ -535,6 +536,7 @@ rb_method_definition_release(rb_method_definition_t *def)
             if (METHOD_DEBUG) fprintf(stderr, "-%p-%s:%d->%d (dec)\n", (void *)def, rb_id2name(def->original_id),
                                       reference_count, def->reference_count);
         }
+        RB_VM_LOCK_LEAVE();
     }
 }
 
@@ -621,9 +623,11 @@ setup_method_cfunc_struct(rb_method_cfunc_t *cfunc, VALUE (*func)(ANYARGS), int
 static rb_method_definition_t *
 method_definition_addref(rb_method_definition_t *def, bool complemented)
 {
+    RB_VM_LOCK_ENTER();
     if (!complemented &&  def->reference_count > 0) def->aliased = true;
     def->reference_count++;
     if (METHOD_DEBUG) fprintf(stderr, "+%p-%s:%d\n", (void *)def, rb_id2name(def->original_id), def->reference_count);
+    RB_VM_LOCK_LEAVE();
     return def;
 }
 

But I don't think it is a good idea because the cost would be high.
Hopefully there is a better solution.

Actions #3

Updated by jhawthorn (John Hawthorn) 10 days ago

  • Status changed from Open to Closed

Applied in changeset git|bfe6068417ca41a6b88a1ba5fcde04f9a76718a7.


Use atomic for method reference count [Bug #20934]

This changes reference_count on rb_method_definition_struct into an
atomic.

Ractors can create additional references as part of bind_call or
(presumably) similar. Because this can be done inside Ractors, we should
use a lock or atomics so that we don't race and avoid incrementing.

Co-authored-by: wanabe

Updated by jhawthorn (John Hawthorn) 10 days ago

  • Assignee set to ractor

This should be fixed by bfe6068417ca41a6b88a1ba5fcde04f9a76718a7. I have some quick benchmark results in https://github.com/ruby/ruby/pull/12951, using the global lock wasn't too bad for performance and had basically no penalty single threaded, but it was easy enough to convert this reference count to use atomics.

Actions

Also available in: Atom PDF

Like0
Like0Like1Like0Like0