Bug #15362
closed[PATCH] Avoid GCing dead stack after switching away from a fiber
Description
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.
Please let me know if anything in my understanding is wrong. I've pasted my commit message below.
Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek } # fiber constructed inside the
# block and saved inside `enum`
thread.join
sleep 5 # thread finishes and thread cache wait time runs out.
# Native thread exits, possibly freeing its stack.
GC.start # segfault because GC tires to mark the dangling stack pointer
# inside `enum`'s fiber
The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses ./configure --disable-fiber-coroutine
, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)
Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.
On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)
This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.
Fixes Bug #14561
- cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
fiber to keep the GC marking it.saved_ec
gets rehydrated with a
stack pointer if/when the fiber runs again.
Files