Project

General

Profile

Actions

Bug #17666

closed

Thread#join hangs when Fiber.set_scheduler is set

Added by arjunmdas (arjun das) about 3 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
[ruby-core:102687]

Description

class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end

  def close
    byebug
  end

  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end

  def io_wait(io, events, timeout)
    byebug
  end

  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end

  def process_wait(pid, flags)
    byebug
  end

  def unblock(blocker, fiber)
    byebug
  end
end
  
Fiber.set_scheduler(MockScheduler.new)

t1 = Thread.new do
       p 'before'
       sleep 1
       p 'after'
     end

t1.join

Code hangs at this point.

Updated by arjunmdas (arjun das) about 3 years ago

  • Subject changed from Sleep in a thread hangs when Fiber.set_scheduler is set to Thread#join hangs when Fiber.set_scheduler is set
class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end

  def close
    byebug
  end

  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end

  def io_wait(io, events, timeout)
    byebug
  end

  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end

  def process_wait(pid, flags)
    byebug
  end

  def unblock(blocker, fiber)
    byebug
  end
end

Fiber.set_scheduler(MockScheduler.new)

t1 = Thread.new do
       p 'before'
       p 'after'
     end

t1.join

Code hangs a this point.

Updated by xtkoba (Tee KOBAYASHI) about 3 years ago

This issue seems already resolved in the latest development version. (I confirmed that it reproduces in 3.0.0p0 release version on x86_64-linux.)

Updated by arjunmdas (arjun das) about 3 years ago

Thanks a lot for confirming.
I will check the latest development version as well. Are you aware of any other tracking bug for this?

Updated by xtkoba (Tee KOBAYASHI) about 3 years ago

This issue seems fixed in 5f69a7f60467fa58c2f998daffab43e118bff36c. I'm not sure if there is a ticket here. There is a related PR in GitHub: https://github.com/ruby/ruby/pull/3945

Updated by arjunmdas (arjun das) about 3 years ago

Thanks again.

Just curious, how did you identify the right PR that fixed this?

Updated by xtkoba (Tee KOBAYASHI) about 3 years ago

arjunmdas (arjun das) I just searched the git log with the keyword "fiber". The PR number comes from the commit message.

Actions #7

Updated by jeremyevans0 (Jeremy Evans) about 3 years ago

  • Status changed from Open to Closed

Updated by arjunmdas (arjun das) about 3 years ago

xtkoba (Tee KOBAYASHI) - Re-verified in the latest development. It's fixed. Thanks for the update.
.

Updated by ioquatix (Samuel Williams) almost 3 years ago

I think we should consider isolating this for backport to 3.0.3 if possible.

This three lines:

https://github.com/ruby/ruby/blob/v3_0_2/thread.c#L547-L549

        if (target_thread->scheduler != Qnil) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        } else {

should become:

        if (target_thread->scheduler != Qnil && rb_fiberptr_blocking(join_list->fiber) == 0) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        }

We would need to backport rb_fiberptr_blocking from cont.c too:

unsigned int rb_fiberptr_blocking(struct rb_fiber_struct *fiber)
{
    return fiber->blocking;
}

There are some other usage in thread_sync.c which might be candidate for backport. Check https://github.com/ruby/ruby/blob/master/thread_sync.c for usage of rb_fiberptr_blocking. These are all bugs because it can cause the fiber scheduler to be invoked when it shouldn't causing indefinite hang.

cc @nagachika (Tomoyuki Chikanaga)

Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED

I don't have enough time to see details of the request just now. I filled the Backport field to recall myself to this ticket later.

Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago

Hello Samuel,
I have created a backport patch according to your suggestions.

https://github.com/nagachika/ruby/commit/c2697018d4d8cad7ea80ca6aa57f65d76072053c

Are there anything else should be included?
Would you review my patch please?

Updated by ioquatix (Samuel Williams) over 2 years ago

@nagachika (Tomoyuki Chikanaga) thanks so much for your effort here. I have a local test case which can easily fail without this fix, so I'll try your PR to confirm it fixes the issue.

Updated by ioquatix (Samuel Williams) over 2 years ago

I have made a PR to your PR.

https://github.com/nagachika/ruby/pull/1

This adds a test case which fails without this backport.

Updated by ioquatix (Samuel Williams) over 2 years ago

I also tested it against async main branch which can hang on 3.0.2, but passes on 3.0.2 + your PR. So, I can confirm this fixes the issue that was discussed here.

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

Thank you for your confirmation and adding a test!
I will create a patch for ruby_3_0 branch based on your pull request.

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.7: DONTNEED, 3.0: REQUIRED to 2.7: DONTNEED, 3.0: DONE

ruby_3_0 95dc88c88869541dd0eccafd14924d78c8d7f427 merged revision(s) 5f69a7f60467fa58c2f998daffab43e118bff36c.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0