Project

General

Profile

Bug #15362

Updated by alanwu (Alan Wu) about 1 year ago

Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.



Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek } # fiber constructed inside the
# block and saved inside `enum`
thread.join
sleep 5 # thread finishes and thread cache wait time runs out.
# Native thread exits, possibly freeing its stack.
GC.start # segfault because GC tires to mark the dangling stack pointer
# inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

* cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
stack pointer if/when the fiber runs again.

Back