Project

General

Profile

Actions

Bug #18048

closed

Thread#join can break with fiber scheduler unblock fails or blocks.

Added by ioquatix (Samuel Williams) about 2 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:104692]

Description

In addition to https://bugs.ruby-lang.org/issues/17666 we found several more cases that need to be addressed.

Fix potential hang when joining threads.

If the thread termination invokes user code after th->status becomes
THREAD_KILLED, and the user unblock function causes that th->status to
become something else (e.g. THREAD_RUNNING), threads waiting in
thread_join_sleep will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as th->value becomes defined.

Wake up join list within thread EC context. (#4471)

If rb_fiber_scheduler_unblock raises an exception, it can result in a
segfault if rb_threadptr_join_list_wakeup is not within a valid EC. This
change moves rb_threadptr_join_list_wakeup into the thread's top level EC
which initially caused an infinite loop because on exception will retry. We
explicitly remove items from the thread's join list to avoid this situation.

These are already fixed on master branch. Here is a PR for backport: https://github.com/ruby/ruby/pull/4686

Updated by nagachika (Tomoyuki Chikanaga) about 2 months ago

Thank you for creating the pack for backport. I see the PR was basically backporting 050a89543952a2c9e7c9bc938f4fdb538f6c9278 partially. I will try to merge it.

Actions #2

Updated by nagachika (Tomoyuki Chikanaga) about 2 months ago

  • Status changed from Open to Closed

Updated by ioquatix (Samuel Williams) about 2 months ago

The PR is 050a89543952a2c9e7c9bc938f4fdb538f6c9278 followed by 13f8521c630a15c87398dee0763e95f59c032a94

Updated by nagachika (Tomoyuki Chikanaga) about 2 months ago

I see the git:2d4f29e77e883c29e35417799f8001b8046cde03 was pushed as the retry of 13f8521c630a15c87398dee0763e95f59c032a94.
I will pay attention on the RubyCI for a while.

Updated by nagachika (Tomoyuki Chikanaga) about 1 month ago

I create the backport patch including 050a89543952a2c9e7c9bc938f4fdb538f6c9278 and 13f8521c630a15c87398dee0763e95f59c032a94 and push to my branch. See https://github.com/ruby/ruby/pull/4686/files.

But on the branch, make btest hangs on the bootstraptest/test_ractor.rb.

% make btest
2021-08-14 16:57:56 +0900
Driver is ruby 3.0.3p123 (2021-08-08 revision 3922394c85) [x86_64-darwin19]
Target is ruby 3.0.3p124 (2021-08-14 revision 720d9c0803) [x86_64-darwin19]

test_attr.rb            PASS 2
test_autoload.rb        PASS 8
test_block.rb           PASS 58
test_class.rb           PASS 48
test_env.rb             PASS 2
test_eval.rb            PASS 37
test_exception.rb       PASS 34
test_fiber.rb           PASS 5
test_finalizer.rb       PASS 1
test_flip.rb            PASS 1
test_flow.rb            PASS 62
test_fork.rb            PASS 4
test_gc.rb              PASS 2
test_insns.rb           PASS 383
test_io.rb              PASS 9
test_jump.rb            PASS 29
test_literal.rb         PASS 156
test_literal_suffix.rb  PASS 48
test_load.rb            PASS 2
test_marshal.rb         PASS 1
test_massign.rb         PASS 34
test_method.rb          PASS 223
test_objectspace.rb     PASS 6
test_proc.rb            PASS 37
test_ractor.rb          \
↑ hangs up here

Samuel, would you review my backport candidate branch if you don't mind?

Actions

Also available in: Atom PDF