Bug #17666
closedThread#join hangs when Fiber.set_scheduler is set
Description
class MockScheduler
def block(blocker, timeout = nil)
byebug
end
def close
byebug
end
def fiber(&block)
byebug
Fiber.new(blocking: false, &block).tap(&:resume)
end
def io_wait(io, events, timeout)
byebug
end
def kernel_sleep(duration=nil)
byebug
Fiber.yield
end
def process_wait(pid, flags)
byebug
end
def unblock(blocker, fiber)
byebug
end
end
Fiber.set_scheduler(MockScheduler.new)
t1 = Thread.new do
p 'before'
sleep 1
p 'after'
end
t1.join
Code hangs at this point.
Updated by arjunmdas (arjun das) about 4 years ago
- Subject changed from Sleep in a thread hangs when Fiber.set_scheduler is set to Thread#join hangs when Fiber.set_scheduler is set
class MockScheduler
def block(blocker, timeout = nil)
byebug
end
def close
byebug
end
def fiber(&block)
byebug
Fiber.new(blocking: false, &block).tap(&:resume)
end
def io_wait(io, events, timeout)
byebug
end
def kernel_sleep(duration=nil)
byebug
Fiber.yield
end
def process_wait(pid, flags)
byebug
end
def unblock(blocker, fiber)
byebug
end
end
Fiber.set_scheduler(MockScheduler.new)
t1 = Thread.new do
p 'before'
p 'after'
end
t1.join
Code hangs a this point.
Updated by xtkoba (Tee KOBAYASHI) about 4 years ago
This issue seems already resolved in the latest development version. (I confirmed that it reproduces in 3.0.0p0 release version on x86_64-linux.)
Updated by arjunmdas (arjun das) about 4 years ago
Thanks a lot for confirming.
I will check the latest development version as well. Are you aware of any other tracking bug for this?
Updated by xtkoba (Tee KOBAYASHI) about 4 years ago
This issue seems fixed in 5f69a7f60467fa58c2f998daffab43e118bff36c. I'm not sure if there is a ticket here. There is a related PR in GitHub: https://github.com/ruby/ruby/pull/3945
Updated by arjunmdas (arjun das) about 4 years ago
Thanks again.
Just curious, how did you identify the right PR that fixed this?
Updated by xtkoba (Tee KOBAYASHI) about 4 years ago
user:arjunmdas I just searched the git log with the keyword "fiber". The PR number comes from the commit message.
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
- Status changed from Open to Closed
Updated by arjunmdas (arjun das) about 4 years ago
xtkoba (Tee KOBAYASHI) - Re-verified in the latest development. It's fixed. Thanks for the update.
.
Updated by ioquatix (Samuel Williams) almost 4 years ago
I think we should consider isolating this for backport to 3.0.3 if possible.
This three lines:
https://github.com/ruby/ruby/blob/v3_0_2/thread.c#L547-L549
if (target_thread->scheduler != Qnil) {
rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
} else {
should become:
if (target_thread->scheduler != Qnil && rb_fiberptr_blocking(join_list->fiber) == 0) {
rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
}
We would need to backport rb_fiberptr_blocking
from cont.c
too:
unsigned int rb_fiberptr_blocking(struct rb_fiber_struct *fiber)
{
return fiber->blocking;
}
There are some other usage in thread_sync.c
which might be candidate for backport. Check https://github.com/ruby/ruby/blob/master/thread_sync.c for usage of rb_fiberptr_blocking
. These are all bugs because it can cause the fiber scheduler to be invoked when it shouldn't causing indefinite hang.
Updated by nagachika (Tomoyuki Chikanaga) almost 4 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED
I don't have enough time to see details of the request just now. I filled the Backport field to recall myself to this ticket later.
Updated by nagachika (Tomoyuki Chikanaga) almost 4 years ago
Hello Samuel,
I have created a backport patch according to your suggestions.
https://github.com/nagachika/ruby/commit/c2697018d4d8cad7ea80ca6aa57f65d76072053c
Are there anything else should be included?
Would you review my patch please?
Updated by ioquatix (Samuel Williams) almost 4 years ago
@nagachika (Tomoyuki Chikanaga) thanks so much for your effort here. I have a local test case which can easily fail without this fix, so I'll try your PR to confirm it fixes the issue.
Updated by ioquatix (Samuel Williams) almost 4 years ago
I have made a PR to your PR.
https://github.com/nagachika/ruby/pull/1
This adds a test case which fails without this backport.
Updated by ioquatix (Samuel Williams) almost 4 years ago
I also tested it against async
main
branch which can hang on 3.0.2, but passes on 3.0.2 + your PR. So, I can confirm this fixes the issue that was discussed here.
Updated by nagachika (Tomoyuki Chikanaga) almost 4 years ago
Thank you for your confirmation and adding a test!
I will create a patch for ruby_3_0 branch based on your pull request.
Updated by nagachika (Tomoyuki Chikanaga) almost 4 years ago
- Backport changed from 2.7: DONTNEED, 3.0: REQUIRED to 2.7: DONTNEED, 3.0: DONE
ruby_3_0 95dc88c88869541dd0eccafd14924d78c8d7f427 merged revision(s) 5f69a7f60467fa58c2f998daffab43e118bff36c.