Bug #20001
closedMake Ruby work properly with ASAN enabled
Description
This ticket covers some work I want to do to get the ASAN infrastructure working in Ruby working again. I don't know if it ever worked well, but if it did, it appears to have bitrotted. Here are a few of its current problems:
Stack size calculations are wrong¶
Ruby takes the address of a local variable as part of the process of working out the top/bottom of the native stack in native_thread_init_stack
. Because ASAN can end up putting some local variables on a "fake stack", this calculation can wind up producing the wrong result and setting th->ec->machine.stack_start
incorrectly. This then leads to stack_check
thinking that the machine stack has overflowed all the time, and thus, leading to programs like the following to fail:
ASAN_OPTIONS=use_sigaltstack=0:detect_leaks=0 ./miniruby -e 'Thread.new { puts "hi" }.join'
#<Thread:0x00007fb5d79f3f28 -e:1 run> terminated with exception (report_on_exception is true):
SystemStackError
-e: stack level too deep (SystemStackError)
Another consequence of stack size detection not working properly is that the machine stack is not properly marked during GC, so things on the stack which should be considered live get prematurely collected.
ASAN provides the __asan_addr_is_in_fake_stack
function which can be used to get the address of a local variable on the real stack; I think Ruby's various stack-detecting macros could then make use of this to make it work.
VALUEs in fake stacks are not marked¶
Another consequece of ASAN storing local variables in fake stacks is that we don't see them when doing the machine stack mark. Again, the __asan_addr_is_in_fake_stack
function can help us here. ASAN leaves a pointer to the fake stack on the real stack in every frame. When marking the machine stack, we can check each word to see if it's a pointer to a fake stack frame, and then use __asan_addr_is_in_fake_stack
to get the extents of the fake frame and scan that too.
This seems to be e.g. how V8 does it
Doesn't work with GCC¶
Our ASAN implementation doesn't work with GCC, even though GCC supports ASAN. This is because we use the __has_feature(address_sanitizer)
macro in sanitizers.h, which is a clang-ism. The equivalent GCCism is __SANITIZE_ADDRESS__
and we should check that too.
Plan of attack¶
At the moment, I can't even run a full build of ruby to run, because miniruby crashees during the build process. My plan of attack here is to:
- Address those known problems I've already identified above
- Get
make
to actually work with asan - Try running the test suite through ASAN, and fix any issues that turns up
- I'm thinking we should add an
--enable-asan
or--enable-address-sanitizer
or some such to our configure script, to make it easy to build Ruby with ASAN without having to poke around with individual CFLAGS/LDFLAGS - Eventually, it would be great to actually run the tests under ASAN in CI
This is probably a medium term body of work, but I'll try and tackle it in bits.
Also: @HParker (Adam Hess) and @peterzhu2118 (Peter Zhu) - I know you folks have been working on getting Valgrind to work better with Ruby, for leak detection. I think I see my efforts here as complementary to yours, rather than duplicative. The ASAN infrastructure for poisoning/unpoisoning stuff in the GC already exists and is close to working properly, and it really did help me solve a bug yesterday (https://bugs.ruby-lang.org/issues/19994), so it seems useful and should be made to work. Your work on freeing memory on shutdown (https://bugs.ruby-lang.org/issues/19993) should actually help ASAN usefully detect leaks as well. I think ASAN might be better for eventually running CI checks against the full Ruby test suite, since allegedly it's faster. However, if you think solving these issues with ASAN is a waste of time and Valgrind can catch the same bugs already, please chime in!
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago
This is the first part of this work: https://github.com/ruby/ruby/pull/8903
With this PR, i can at least run make
with asan enabled on both macos & linux with clang. GCC still doesn't work and i haven't tried actually running the test suite yet!
Updated by HParker (Adam Hess) about 1 year ago
I think I see my efforts here as complementary to yours, rather than duplicative
💯 I agree. I believe having both ASAN and Valgrind available is ideal. Valgind, I tend to be very confident in the results, but ASAN will be easier to integrate with CI and run consistently. The ability to run a bigger suite in CI might make ASAN a better long term solution if it can find the same problems.
Maybe we can share the environment variable since i assume we will end up needing to manage/free the same things?
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 12 months ago
Valgind, I tend to be very confident in the results, but ASAN will be easier to integrate with CI and run consistently
That's how I see the two tools too
Maybe we can share the environment variable since i assume we will end up needing to manage/free the same things?
I don't think the ASAN stuff needs anything specific for free-on-exit above what you're working on; once you have that merged it should Just Work (tm) and make ASAN report fewer leaks at termination.
Updated by Anonymous 10 months ago
- Status changed from Open to Closed
Applied in changeset git|4ba8f0dc993953d3ddda6328e3ef17a2fc2cbde5.
Pass down "stack start" variables from closer to the top of the stack
The implementation of native_thread_init_stack
for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:
- pthreads uses it as a lower-bound on the start of the stack, because
glibc (and maybe other libcs) can store its own data on the stack
before calling into user code on thread creation. - win32 uses it as an argument to VirtualQuery, which gets the extent of
the memory mapping which contains the variable
However, the local being used for this is actually allocated inside
the native_thread_init_stack
frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.
A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
native_thread_init_stack
to be used in the calculation. This probably
doesn't really matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.
[Bug #20001]
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 10 months ago
- Status changed from Closed to Open
Updated by Anonymous 10 months ago
- Status changed from Open to Closed
Applied in changeset git|807714447ef02c77bb0e17fe27d96ee2692264f8.
Pass down "stack start" variables from closer to the top of the stack
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:
- pthreads uses it as a lower-bound on the start of the stack, because
glibc (and maybe other libcs) can store its own data on the stack
before calling into user code on thread creation. - win32 uses it as an argument to VirtualQuery, which gets the extent of
the memory mapping which contains the variable
However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.
In the main thread case, we have the INIT_STACK
macro, which is used
for pthreads to set the native_main_thread->stack_start
value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is not what's used to set the execution context
machine stack (th->ec->machine_stack.stack_start
); that gets set as
part of a call to ruby_thread_init_stack
in Init_BareVM
, using the
address of a local variable allocated inside Init_BareVM
. This is
too low; we need to use a local allocated closer to the top of the
program.
In the new thread case, the lolcal is allocated inside
native_thread_init_stack
, which is, again, too low.
In both cases, this means that we might have VALUEs lying outside the
bounds of th->ec->machine.stack_{start,end}
, which won't be marked
correctly by the GC machinery.
To fix this,
- In the main thread case: We already have
INIT_STACK
at the right
level, so just pass that local var toruby_thread_init_stack
. - In the new thread case: Allocate the local one level above the call to
native_thread_init_stack
incall_thread_start_func2
.
[Bug #20001]
fix
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 8 months ago
- Related to Misc #20387: Meta-ticket for ASAN support added