Project

General

Profile

Actions

Feature #19554

open

Invalid memory access detected by Valgrind when using Fibers

Added by peterzhu2118 (Peter Zhu) about 1 year ago. Updated about 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:113032]

Description

This issue was originally reported here: https://github.com/Shopify/ruby_memcheck/issues/14

Running the script shown below using valgrind --trace-children=yes --num-callers=500 ruby test.rb outputs a lot of invalid memory access errors. I've shown a few sample errors below.

I am able to reproduce this issue on the master branch (commit 1e9a218ade), 3.2, 3.1, 3.0.

require "bundler/inline"
gemfile do
  source "https://rubygems.org"
  gem "graphql"
end

module Example
  class FooType < GraphQL::Schema::Object
    field :id, ID, null: false
  end

  class FooSource < GraphQL::Dataloader::Source
    def fetch(ids)
      ids
    end
  end

  class QueryType < GraphQL::Schema::Object
    field :foo, Example::FooType do
      argument :foo_id, GraphQL::Types::ID, required: false, loads: Example::FooType
    end

    def foo(foo: nil); end
  end

  class Schema < GraphQL::Schema
    query Example::QueryType
    use GraphQL::Dataloader

    def self.object_from_id(id, ctx)
      ctx.dataloader.with(Example::FooSource).request(id)
    end
  end
end

Example::Schema.execute(<<-GRAPHQL)
{
  foo(fooId: "Other") {
    id
  }
}
GRAPHQL
==203957== Use of uninitialised value of size 8
==203957==    at 0x3453FD: vm_exec_core (vm.inc:4411)
==203957==    by 0x357EFB: rb_vm_exec (vm.c:2366)
==203957==    by 0x354E44: invoke_block (vm.c:1384)
==203957==    by 0x355759: invoke_iseq_block_from_c (vm.c:1440)
==203957==    by 0x355759: invoke_block_from_c_proc (vm.c:1538)
==203957==    by 0x355759: vm_invoke_proc (vm.c:1568)
==203957==    by 0x355DF4: rb_vm_invoke_proc (vm.c:1589)
==203957==    by 0x48F695: rb_fiber_start (cont.c:2513)
==203957==    by 0x48CCF8: fiber_entry (cont.c:831)
==203957==
==203957== Invalid write of size 8
==203957==    at 0x48C407: fiber_pool_stack_reset (cont.c:325)
==203957==    by 0x48C4E9: fiber_pool_vacancy_reset (cont.c:364)
==203957==    by 0x48CBB0: fiber_pool_stack_release (cont.c:752)
==203957==    by 0x48CECF: fiber_stack_release (cont.c:874)
==203957==    by 0x48FC9F: fiber_switch (cont.c:2726)
==203957==    by 0x4901F9: fiber_resume_kw (cont.c:2906)
==203957==    by 0x490235: rb_fiber_resume_kw (cont.c:2912)
==203957==    by 0x4903B7: rb_fiber_m_resume (cont.c:2973)
==203957==    by 0x3337D6: ractor_safe_call_cfunc_m1 (vm_insnhelper.c:3166)
==203957==    by 0x33440A: vm_call_cfunc_with_frame_ (vm_insnhelper.c:3357)
==203957==    by 0x3345E1: vm_call_cfunc_with_frame (vm_insnhelper.c:3385)
==203957==    by 0x3398E5: vm_sendish (vm_insnhelper.c:5225)
==203957==    by 0x341203: vm_exec_core (insns.def:835)
==203957==    by 0x357EFB: rb_vm_exec (vm.c:2366)
==203957==    by 0x354E44: invoke_block (vm.c:1384)
==203957==    by 0x355759: invoke_iseq_block_from_c (vm.c:1440)
==203957==    by 0x355759: invoke_block_from_c_proc (vm.c:1538)
==203957==    by 0x355759: vm_invoke_proc (vm.c:1568)
==203957==    by 0x355DF4: rb_vm_invoke_proc (vm.c:1589)
==203957==    by 0x48F695: rb_fiber_start (cont.c:2513)
==203957==    by 0x48CCF8: fiber_entry (cont.c:831)
==203957==  Address 0x9bad008 is in a rw- anonymous segment

Updated by ioquatix (Samuel Williams) about 1 year ago

Supporting Valgrind is a good idea. However, I actually prefer ASAN. But both are good tools.

I recently added ASAN support for fibers/coroutines: https://github.com/ioquatix/ruby/commit/901525b1079ac02da0122a76d8e4c3546a7f80f6

I also proposed to add it to the build matrix: https://github.com/ruby/ruby/pull/5939

You will probably find that without Valgrind knowing about the stack swapping, it won't work, you'll need to inform it when the stack is swapped.

I don't really want to maintain two different implementations of "stack swapping instrumentation". I suppose we could abstract it out, however, it's fairly non-trivial/invasive.

Therefore, it might be better to use ASAN over Valgrind to keep the code simpler. Or, if we decided Valgrind was better, we should remove ASAN, for example.

==203957== Invalid write of size 8
==203957==    at 0x48C407: fiber_pool_stack_reset (cont.c:325)
==203957==    by 0x48C4E9: fiber_pool_vacancy_reset (cont.c:364)
==203957==    by 0x48CBB0: fiber_pool_stack_release (cont.c:752)
==203957==    by 0x48CECF: fiber_stack_release (cont.c:874)

Regarding this specific write, this isn't an invalid write. It's written into a memory mapped area. I wonder why it's being flagged. "Address 0x9bad008 is in a rw- anonymous segment" is a valid area IMHO. The fiber's stack, if it's unused, is used for free list management. If you can tell me how we should inform Valgrind that this is a valid area to write into, I can adjust the code (similar to poison/unpoison memory).

Updated by peterzhu2118 (Peter Zhu) about 1 year ago

Thanks for the reply @ioquatix (Samuel Williams)!

I haven't looked too much into ASAN, I'll look into it further. My main concern with ASAN is the need for Ruby to be compiled with it turned on and that additional code may need to be added into native extensions to support it. Both of these are barriers to entry compared to Valgrind.

In the Valgrind manual, it looks like there's a VALGRIND_MAKE_MEM_DEFINED function. Maybe that can be used to mark the regions as valid.

Updated by ioquatix (Samuel Williams) about 1 year ago

I haven't looked too much into ASAN, I'll look into it further. My main concern with ASAN is the need for Ruby to be compiled with it turned on and that additional code may need to be added into native extensions to support it. Both of these are barriers to entry compared to Valgrind.

It's unlikely that Valgrind will work with CRuby out of the box without Valgrind specific code e.g. for marking stacks, poisoning "freed" memory, etc. I am okay to help you try it. Just don't underestimate how complicated it can be to get all the right feedback. e.g. ASAN poison is used for GC to mark regions of the memory that should not be accessed. The same is true for fibers. It already all works.

Updated by peterzhu2118 (Peter Zhu) about 1 year ago

It's unlikely that Valgrind will work with CRuby out of the box without Valgrind specific code e.g. for marking stacks, poisoning "freed" memory, etc.

Valgrind (without the memory leak detection feature, so only detecting invalid memory access) actually works very well with CRuby (excluding Fibers), which is why I opened this ticket.

For most C extensions, they are able to use Valgrind with memory leak detection using ruby_memcheck, which applies heuristics to remove false-positive memory leaks in Ruby.

In fact, issues like #18264 and #18936 were found by Valgrind.

Updated by mdalessio (Mike Dalessio) about 1 year ago

I'm just here to support Peter's take: Valgrind is a very useful tool for C extension maintainers to find and prevent illegal memory access. It's been an incredibly valuable tool for me over the ~15 years I've been a maintainer, and is a key tool in Nokogiri's CI pipelines.

ASan is also useful! But because it's a build-time option, it's slightly less useful for debugging production builds. If it's not too difficult, I think Ruby should support both tools.

Actions #6

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Tracker changed from Bug to Feature
  • Backport deleted (2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like1Like0