Project

General

Profile

Actions

Bug #21719

open

Thread deadlock with explicit require of a base clase in Linux Ruby 3.4

Bug #21719: Thread deadlock with explicit require of a base clase in Linux Ruby 3.4

Added by jcuello@fu.do (Juan Manuel Cuello) 24 days ago. Updated 14 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
[ruby-core:123951]

Description

I originally reported the issue in Zeitwerk, but we then figured out that it seems to be related to Ruby.

Basically, I'm having a threads deadlock when using requires with autoloadable classes:

# jobs/base.rb
#
module Jobs
 class Base
 end
end

# jobs/a.rb
#
require './jobs/base'

module Jobs
  class A < Base
    def perform
      puts self.class.name
    end
  end
end

# jobs/b.rb
#
module Jobs
  class B < Base
    def perform
      puts self.class.name
    end
  end
end

# start.rb
#
module Jobs
  autoload :Base, './jobs/base'
  autoload :A, './jobs/a'
  autoload :B, './jobs/b'
end

a = Thread.new { Jobs::A.new.perform }
b = Thread.new { Jobs::B.new.perform }

a.join
b.join
ruby --version && ruby start.rb
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
start.rb:12:in 'Thread#join': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00005ca30dfbc500 main thread:0x00005ca30dd30330
* #<Thread:0x000076b1a7d6a658 sleep_forever>
   rb_thread_t:0x00005ca30dd30330 native:0x000076b1a81b87c0 int:0
   
* #<Thread:0x000076b1a7d2e928 start.rb:9 sleep_forever>
   rb_thread_t:0x00005ca30dfbc500 native:0x000076b18c74d6c0 int:0
    depended by: tb_thread_id:0x00005ca30dd30330
   
* #<Thread:0x000076b1a7d2e3b0 start.rb:10 sleep_forever>
   rb_thread_t:0x00005ca30dfb0fd0 native:0x000076b18c54b6c0 int:0 mutex:0x00005ca30dfe2b60 cond:1
   

        from start.rb:12:in '<main>'

Note the require './jobs/base' in jobs/a.rb. If I remove it, everything works. The same happens if I add the same explicit require in jobs/b.rb.

It seems to have been fixed in ruby 3.4 in ea2af5782df63266577ba08a4ef4c30b6d63e564, but not apparent in Linux (which is my case) until 6fbc32b5d0da31535cccc0eca1853273313a0b52

I'm not familiar with the ruby codebase, so It's not clear to me why the change to prism fixed the threads issue and why it didn't have impact in Linux until the other fix, but bisecting the source code and running each revision against the code above, that is what I came to.

I can create a PR to backport the Linux fix to ruby_3_4 branch, as in master everything is working as expected.

Updated by mame (Yusuke Endoh) 14 days ago Actions #1 [ruby-core:124111]

It is not stably reproducible because the code heavily relies on race condition.
Here is a more reproducible and simpified version.

# start.rb
#
autoload :Target, "./target"

# a hack to trigger context switch after Kernel#require
TracePoint.new(:script_compiled) { sleep 2 }.enable

# just for debug print
Thread.current.name = "main"
TracePoint.new(:line) { p [Thread.current.name, it] }.enable

Thread.new do
  sleep 1
  Target
end.name = "sub"

require "./target"


# target.rb
#
class Target
end

The deadlock reproduces on both Ruby 3.4.7 and master.

$ ruby --disable-gems start.rb
["main", #<TracePoint:line start.rb:10>]
["main", #<TracePoint:line start.rb:15>]
["sub", #<TracePoint:line start.rb:11>]
["sub", #<TracePoint:line start.rb:12>]
["main", #<TracePoint:line /home/mame/work/ruby/target.rb:1>]
/home/mame/work/ruby/target.rb:1:in '<top (required)>': No live threads left. Deadlock? (fatal)
2 threads, 2 sleeps current:0x000058ecbdbd5330 main thread:0x000058ecbdbd5330
* #<Thread:0x00007a1d1a7f8a08@main sleep_forever>
   rb_thread_t:0x000058ecbdbd5330 native:0x00007a1d34cecc00 int:0
   /home/mame/work/ruby/target.rb:1:in '<top (required)>'
   start.rb:15:in 'Kernel#require'
   start.rb:15:in '<main>'
* #<Thread:0x00007a1d18b4f190@sub start.rb:10 sleep_forever>
   rb_thread_t:0x000058ecbdd74e70 native:0x00007a1d18a3e6c0 int:0 mutex:1 cond:1
   start.rb:12:in 'Kernel#require'
   start.rb:12:in 'block in <main>'

        from start.rb:15:in 'Kernel#require'
        from start.rb:15:in '<main>'

@akr (Akira Tanaka) @nobu (Nobuyoshi Nakada) I suspect the hack to hide constant definitions when requiring via autoload isn't working properly.

## target.rb
#

# autoload of Target is not hidden here. Is this correct?
p [Thread.current.name, autoload?(:Target)]
  #=> actual: ["main", "./target"], expected: ["main", nil]

class Target # autoload is fired here and attempts to load target.rb recursively, which leads to the deadlock
end

Do you understand what's happening?

Actions

Also available in: PDF Atom