Project

General

Profile

Actions

Bug #20001

closed

Make Ruby work properly with ASAN enabled

Added by kjtsanaktsidis (KJ Tsanaktsidis) 3 months ago. Updated about 1 month ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:115346]

Description

This ticket covers some work I want to do to get the ASAN infrastructure working in Ruby working again. I don't know if it ever worked well, but if it did, it appears to have bitrotted. Here are a few of its current problems:

Stack size calculations are wrong

Ruby takes the address of a local variable as part of the process of working out the top/bottom of the native stack in native_thread_init_stack. Because ASAN can end up putting some local variables on a "fake stack", this calculation can wind up producing the wrong result and setting th->ec->machine.stack_start incorrectly. This then leads to stack_check thinking that the machine stack has overflowed all the time, and thus, leading to programs like the following to fail:

ASAN_OPTIONS=use_sigaltstack=0:detect_leaks=0 ./miniruby -e 'Thread.new { puts "hi" }.join'
#<Thread:0x00007fb5d79f3f28 -e:1 run> terminated with exception (report_on_exception is true):
SystemStackError
-e: stack level too deep (SystemStackError)

Another consequence of stack size detection not working properly is that the machine stack is not properly marked during GC, so things on the stack which should be considered live get prematurely collected.

ASAN provides the __asan_addr_is_in_fake_stack function which can be used to get the address of a local variable on the real stack; I think Ruby's various stack-detecting macros could then make use of this to make it work.

VALUEs in fake stacks are not marked

Another consequece of ASAN storing local variables in fake stacks is that we don't see them when doing the machine stack mark. Again, the __asan_addr_is_in_fake_stack function can help us here. ASAN leaves a pointer to the fake stack on the real stack in every frame. When marking the machine stack, we can check each word to see if it's a pointer to a fake stack frame, and then use __asan_addr_is_in_fake_stack to get the extents of the fake frame and scan that too.

This seems to be e.g. how V8 does it

Doesn't work with GCC

Our ASAN implementation doesn't work with GCC, even though GCC supports ASAN. This is because we use the __has_feature(address_sanitizer) macro in sanitizers.h, which is a clang-ism. The equivalent GCCism is __SANITIZE_ADDRESS__ and we should check that too.

Plan of attack

At the moment, I can't even run a full build of ruby to run, because miniruby crashees during the build process. My plan of attack here is to:

  • Address those known problems I've already identified above
  • Get make to actually work with asan
  • Try running the test suite through ASAN, and fix any issues that turns up
  • I'm thinking we should add an --enable-asan or --enable-address-sanitizer or some such to our configure script, to make it easy to build Ruby with ASAN without having to poke around with individual CFLAGS/LDFLAGS
  • Eventually, it would be great to actually run the tests under ASAN in CI

This is probably a medium term body of work, but I'll try and tackle it in bits.

Also: @HParker (Adam Hess) and @peterzhu2118 (Peter Zhu) - I know you folks have been working on getting Valgrind to work better with Ruby, for leak detection. I think I see my efforts here as complementary to yours, rather than duplicative. The ASAN infrastructure for poisoning/unpoisoning stuff in the GC already exists and is close to working properly, and it really did help me solve a bug yesterday (https://bugs.ruby-lang.org/issues/19994), so it seems useful and should be made to work. Your work on freeing memory on shutdown (https://bugs.ruby-lang.org/issues/19993) should actually help ASAN usefully detect leaks as well. I think ASAN might be better for eventually running CI checks against the full Ruby test suite, since allegedly it's faster. However, if you think solving these issues with ASAN is a waste of time and Valgrind can catch the same bugs already, please chime in!

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 3 months ago

This is the first part of this work: https://github.com/ruby/ruby/pull/8903

With this PR, i can at least run make with asan enabled on both macos & linux with clang. GCC still doesn't work and i haven't tried actually running the test suite yet!

Updated by HParker (Adam Hess) 3 months ago

I think I see my efforts here as complementary to yours, rather than duplicative

💯 I agree. I believe having both ASAN and Valgrind available is ideal. Valgind, I tend to be very confident in the results, but ASAN will be easier to integrate with CI and run consistently. The ability to run a bigger suite in CI might make ASAN a better long term solution if it can find the same problems.

Maybe we can share the environment variable since i assume we will end up needing to manage/free the same things?

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 3 months ago

Valgind, I tend to be very confident in the results, but ASAN will be easier to integrate with CI and run consistently

That's how I see the two tools too

Maybe we can share the environment variable since i assume we will end up needing to manage/free the same things?

I don't think the ASAN stuff needs anything specific for free-on-exit above what you're working on; once you have that merged it should Just Work (tm) and make ASAN report fewer leaks at termination.

Actions #4

Updated by Anonymous about 1 month ago

  • Status changed from Open to Closed

Applied in changeset git|4ba8f0dc993953d3ddda6328e3ef17a2fc2cbde5.


Pass down "stack start" variables from closer to the top of the stack

The implementation of native_thread_init_stack for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

  • pthreads uses it as a lower-bound on the start of the stack, because
    glibc (and maybe other libcs) can store its own data on the stack
    before calling into user code on thread creation.
  • win32 uses it as an argument to VirtualQuery, which gets the extent of
    the memory mapping which contains the variable

However, the local being used for this is actually allocated inside
the native_thread_init_stack frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
native_thread_init_stack to be used in the calculation. This probably
doesn't really matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]

Actions #5

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 month ago

  • Status changed from Closed to Open
Actions #6

Updated by Anonymous about 1 month ago

  • Status changed from Open to Closed

Applied in changeset git|807714447ef02c77bb0e17fe27d96ee2692264f8.


Pass down "stack start" variables from closer to the top of the stack

This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:

  • pthreads uses it as a lower-bound on the start of the stack, because
    glibc (and maybe other libcs) can store its own data on the stack
    before calling into user code on thread creation.
  • win32 uses it as an argument to VirtualQuery, which gets the extent of
    the memory mapping which contains the variable

However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.

In the main thread case, we have the INIT_STACK macro, which is used
for pthreads to set the native_main_thread->stack_start value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is not what's used to set the execution context
machine stack (th->ec->machine_stack.stack_start); that gets set as
part of a call to ruby_thread_init_stack in Init_BareVM, using the
address of a local variable allocated inside Init_BareVM. This is
too low; we need to use a local allocated closer to the top of the
program.

In the new thread case, the lolcal is allocated inside
native_thread_init_stack, which is, again, too low.

In both cases, this means that we might have VALUEs lying outside the
bounds of th->ec->machine.stack_{start,end}, which won't be marked
correctly by the GC machinery.

To fix this,

  • In the main thread case: We already have INIT_STACK at the right
    level, so just pass that local var to ruby_thread_init_stack.
  • In the new thread case: Allocate the local one level above the call to
    native_thread_init_stack in call_thread_start_func2.

[Bug #20001]

fix

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0