Bug #17304
closedRuby stuck calling sched_yield on fork
Description
We have been encountering intermittent bug when using fork - the interpreter process gets stuck in a loop that keeps calling sched_yield. This keeps happening seemingly randomly every few days, while working correctly most of the time. (Summed over all machines we are talking about one successful invocation every second while getting this error once in two-three days.) If I did my search right, it's this code from thread_pthread.c in native_stop_timer_thread(void):
while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
native_thread_yield();
}
If I remember correctly, first time we were hit by this was version 2.1.x - it was hapenning with 2.3.x for sure. (Ruby distributed by Debian in all cases.) I found a similar issue here https://bugs.ruby-lang.org/issues/13794 but that is supposed to be fixed. Patch from revision 60079 mentioned there is applied in Debian sources.
Ruby backtrace from affected process (obtained via gdb) is:
from /usr/bin/app:102:in `<main>'
from /usr/bin/app:102:in `new'
from /usr/lib/app/klass.rb:119:in `initialize'
from /usr/lib/app/klass.rb:119:in `new'
from /usr/lib/app/instance/instance.rb:81:in `initialize'
from /usr/lib/app/instance/instance.rb:81:in `fork'
from /usr/lib/app/instance/instance.rb:103:in `block in initialize'
from /usr/lib/app/instance/instance.rb:103:in `exec'
Attached is a backtrace from GDB (thread apply all bt) and simple reproducing program, which is a simplified version of what our app does and which I wasn't able to actually reproduce the bug with (As I said, it only happens randomly and somewhat rarely.)
I realize this might be pretty difficult to hunt down, so if you need any other information, let me know, I will try to obtain it next time the bug is hit.
Files