Project

General

Profile

Actions

Bug #17666

closed

Thread#join hangs when Fiber.set_scheduler is set

Added by arjunmdas (arjun das) 11 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
[ruby-core:102687]

Description

class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end

  def close
    byebug
  end

  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end

  def io_wait(io, events, timeout)
    byebug
  end

  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end

  def process_wait(pid, flags)
    byebug
  end

  def unblock(blocker, fiber)
    byebug
  end
end

Fiber.set_scheduler(MockScheduler.new)

t1 = Thread.new do
       p 'before'
       sleep 1
       p 'after'
     end

t1.join

Code hangs at this point.

Updated by arjunmdas (arjun das) 11 months ago

  • Subject changed from Sleep in a thread hangs when Fiber.set_scheduler is set to Thread#join hangs when Fiber.set_scheduler is set
class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end

  def close
    byebug
  end

  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end

  def io_wait(io, events, timeout)
    byebug
  end

  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end

  def process_wait(pid, flags)
    byebug
  end

  def unblock(blocker, fiber)
    byebug
  end
end

Fiber.set_scheduler(MockScheduler.new)

t1 = Thread.new do
       p 'before'
       p 'after'
     end

t1.join

Code hangs a this point.

Updated by xtkoba (Tee KOBAYASHI) 11 months ago

This issue seems already resolved in the latest development version. (I confirmed that it reproduces in 3.0.0p0 release version on x86_64-linux.)

Updated by arjunmdas (arjun das) 11 months ago

Thanks a lot for confirming.
I will check the latest development version as well. Are you aware of any other tracking bug for this?

Updated by xtkoba (Tee KOBAYASHI) 11 months ago

This issue seems fixed in 5f69a7f60467fa58c2f998daffab43e118bff36c. I'm not sure if there is a ticket here. There is a related PR in GitHub: https://github.com/ruby/ruby/pull/3945

Updated by arjunmdas (arjun das) 11 months ago

Thanks again.

Just curious, how did you identify the right PR that fixed this?

Updated by xtkoba (Tee KOBAYASHI) 11 months ago

arjunmdas (arjun das) I just searched the git log with the keyword "fiber". The PR number comes from the commit message.

Actions #7

Updated by jeremyevans0 (Jeremy Evans) 11 months ago

  • Status changed from Open to Closed

Updated by arjunmdas (arjun das) 11 months ago

xtkoba (Tee KOBAYASHI) - Re-verified in the latest development. It's fixed. Thanks for the update.
.

Updated by ioquatix (Samuel Williams) 6 months ago

I think we should consider isolating this for backport to 3.0.3 if possible.

This three lines:

https://github.com/ruby/ruby/blob/v3_0_2/thread.c#L547-L549

        if (target_thread->scheduler != Qnil) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        } else {

should become:

        if (target_thread->scheduler != Qnil && rb_fiberptr_blocking(join_list->fiber) == 0) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        }

We would need to backport rb_fiberptr_blocking from cont.c too:

unsigned int rb_fiberptr_blocking(struct rb_fiber_struct *fiber)
{
    return fiber->blocking;
}

There are some other usage in thread_sync.c which might be candidate for backport. Check https://github.com/ruby/ruby/blob/master/thread_sync.c for usage of rb_fiberptr_blocking. These are all bugs because it can cause the fiber scheduler to be invoked when it shouldn't causing indefinite hang.

cc nagachika (Tomoyuki Chikanaga)

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED

I don't have enough time to see details of the request just now. I filled the Backport field to recall myself to this ticket later.

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

Hello Samuel,
I have created a backport patch according to your suggestions.

https://github.com/nagachika/ruby/commit/c2697018d4d8cad7ea80ca6aa57f65d76072053c

Are there anything else should be included?
Would you review my patch please?

Updated by ioquatix (Samuel Williams) 6 months ago

nagachika (Tomoyuki Chikanaga) thanks so much for your effort here. I have a local test case which can easily fail without this fix, so I'll try your PR to confirm it fixes the issue.

Updated by ioquatix (Samuel Williams) 6 months ago

I have made a PR to your PR.

https://github.com/nagachika/ruby/pull/1

This adds a test case which fails without this backport.

Updated by ioquatix (Samuel Williams) 6 months ago

I also tested it against async main branch which can hang on 3.0.2, but passes on 3.0.2 + your PR. So, I can confirm this fixes the issue that was discussed here.

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

Thank you for your confirmation and adding a test!
I will create a patch for ruby_3_0 branch based on your pull request.

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

  • Backport changed from 2.7: DONTNEED, 3.0: REQUIRED to 2.7: DONTNEED, 3.0: DONE

ruby_3_0 95dc88c88869541dd0eccafd14924d78c8d7f427 merged revision(s) 5f69a7f60467fa58c2f998daffab43e118bff36c.

Actions

Also available in: Atom PDF