Bug #15362

Updated by alanwu (Alan Wu) over 3 years ago

Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but 
 it affects the default build configuration for MacOS and is causing segfaults on 2.5.x. 
 I've put the test for this in a separate patch because I'm not sure if we want to have 
 a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.   
 I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs. 


 Please let me know if anything in my understanding is wrong. I've pasted my commit message below. 


 Fibers save execution contextes, and execution contexts include a native 
 stack pointer. It may happen that a Fiber outlive the native thread 
 it executed on. Consider the following code adapted from Bug #14561: 

 enum = { |y| y << 1 } 
 thread = { enum.peek }    # fiber constructed inside the 
                                    # block and saved inside `enum` 
 sleep 5        # thread finishes and thread cache wait time runs out. 
              # Native thread exits, possibly freeing its stack. 
 GC.start       # segfault because GC tires to mark the dangling stack pointer 
              # inside `enum`'s fiber 


 The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE, 
 as those implementations already do what this commit does. 
 Generally on Linux systems, FIBER_USE_NATIVE is 1 even when 
 one uses `./configure --disable-fiber-coroutine`, since most 
 Linux systems have getcontext() and setcontext() which 
 turns on FIBER_USE_NATIVE. (compile with `make 
 DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it) 

 Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE 
 are off, and the GC reads from the stack of a dead native 
 thread, MRI does not segfault on Linux. This is probably due to 
 libpthread not marking the page where the dead stack lives as 
 unreadable. Nevertheless, this use-after-free is visible through 

 On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE. 
 Thread cache is also unavailable for 2.5.x, triggering this issue 
 more often. (thread cache gives this bug a grace period since 
 it makes native threads wait a little before exiting) 

 This issue is very visible on MacOS on 2.5.x since libpthread marks 
 the dead stack as unreadable, consistently turning this use-after-free 
 into a segfault. 

 Fixes Bug #14561 

  * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a 
            fiber to keep the GC marking it. `saved_ec` gets rehydrated with a 
            stack pointer if/when the fiber runs again.