Bug #16809

Fiber crashes with --with-coroutine=copy

Added by ncopa (Natanael Copa) 7 months ago. Updated 15 days ago.

Target version:
ruby -v:
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [s390x-linux-musl]


./revision.h unchanged                                                                                                  
#190 test_fiber.rb:15:in `<top (required)>':                                                                        {                                                                                                         
  #=> "" (expected "ok")                                                                                                
#192 test_fiber.rb:26:in `<top (required)>':                                                                            
     fibers = 100.times.collect{{Fiber.yield}}                                                                 
  #=> "" (expected "ok")     
#193 test_fiber.rb:33:in `<top (required)>':                                                                            
     at_exit {{}.resume }                                                                                     
  #=> killed by SIGFPE (signal 8)                           
#194 test_fiber.rb:37:in `<top (required)>':                                                                        "foo")                                                               
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
test_fiber.rb           FAIL 4/5                                                                                        
#934 test_massign.rb:165:in `<top (required)>':                                                                         
     300.times { a<<s; s=s.succ }                                                                                       
     eval <<-END__                                                                                                      
     GC.stress=true                                      do                                                                                                       
       #{ a.join(",") },*zzz=1                                                                                          
  #=> "" (expected "ok")  [ruby-dev:32581]                                                                              
test_massign.rb         FAIL 1/34                                                                                       
#1391 test_thread.rb:310:in `<top (required)>':                                                                         
     g = enum_for(:local_variables)                         
     loop { }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1392 test_thread.rb:315:in `<top (required)>':                                                                         
     g = enum_for(:block_given?)                                                                                        
     loop { }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1393 test_thread.rb:320:in `<top (required)>':                                                                         
     g = enum_for(:binding)                                                                                             
     loop { }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1394 test_thread.rb:325:in `<top (required)>':                                                                         
     g = "abc".enum_for(:scan, /./)                                                                                     
     loop { }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1395 test_thread.rb:330:in `<top (required)>':                                                                         
     g = Module.enum_for(:new)                              
     loop { }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
test_thread.rb          FAIL 5/48                                                                                       

Thread count: 10000 (skipping)                              
FAIL 10/1409 tests failed                                                                                               
make: *** [ yes-btest-ruby] Error 1

May be related to this warning:

compiling coroutine/copy/Context.c                          
coroutine/copy/Context.c: In function 'coroutine_restore_stack_padded':                                                 
coroutine/copy/Context.c:87:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   87 |     _longjmp(context->state, 1 | (int)buffer);      

Updated by puchuu (Andrew Aladjev) 2 months ago

I've tested copy coroutine. Unfortunately today it is broken completely: hangs, segfaults, etc.

Updated by jeremyevans0 (Jeremy Evans) 2 months ago

  • Assignee set to ioquatix (Samuel Williams)
  • Status changed from Open to Assigned
  • Subject changed from ruby testsuite fails on s390x alpine (musl) with --with-coroutine=copy to Fiber crashes with --with-coroutine=copy

OpenBSD/sparc64 (which uses copy coroutine) is similarly broken in regards to fibers. Even something simple like ruby27 -e '{Fiber.yield}.resume' crashes (ruby26 works fine for this). Changing the title to be more general since this does not just affect s390x alpine (musl).

Updated by jeremyevans0 (Jeremy Evans) 2 months ago

It looks like sometimes the copy coroutine implementation can segfault even on x86_64:

Updated by ioquatix (Samuel Williams) about 2 months ago

This might be a pointer alignment issue / problem with the alloca elision.

After playing around with godbolt compiler explorer, I think this might be one option:

However, I wouldn't be surprised if it doesn't solve the issue.

Updated by jeremyevans0 (Jeremy Evans) 25 days ago

I tried pull request #3624 on OpenBSD/sparc64 and it still crashed.

I was able to come up with a fix that works on OpenBSD/sparc64, as long as a couple files are compiled without optimization:

Updated by ioquatix (Samuel Williams) 15 days ago

I think we found the root cause of this, and it should be addressed by:

However, jeremyevans0 (Jeremy Evans) is still testing it.

Also available in: Atom PDF