Bug #18663
closedAutoload doesn't work with fiber context switch.
Description
As discussed most recently here: https://github.com/ruby/debug/issues/580
The following program appears to work:
#!/usr/bin/env ruby
require 'tempfile'
Tempfile.create(['foo', '.rb']) do |file|
file.write(<<~RUBY)
#
$stderr.puts 1; q = Queue.new
$stderr.puts 2; t = Thread.new{q.pop}
$stderr.puts 3; q << :sig
$stderr.puts 4; t.join
sleep 1
class C
end
RUBY
file.close
autoload :C, file.path
Thread.new do
threads = 3.times.map do |i|
Thread.new do
$stderr.puts "LOADING C"
$stderr.puts C
end
end
threads.each(&:join)
end.join
end
This one doesn't:
#!/usr/bin/env ruby
require 'tempfile'
require_relative 'lib/async'
Tempfile.create(['foo', '.rb']) do |file|
file.write(<<~RUBY)
#
$stderr.puts 1; q = Queue.new
$stderr.puts 2; t = Thread.new{q.pop}
$stderr.puts 3; q << :sig
$stderr.puts 4; t.join
class C
end
RUBY
file.close
autoload :C, file.path
Async do |task|
3.times do |i|
task.async do
$stderr.puts "LOADING C"
$stderr.puts C
end
end
end.wait
end
Semantically, they should be very similar. It feels like someone is checking the current thread rather than the current fiber or there is a poor implementation of locking somewhere, however I don't actually know for sure yet, investigation is required.
Updated by ioquatix (Samuel Williams) about 2 years ago
Pure fiber repro.
#!/usr/bin/env ruby
require 'tempfile'
Tempfile.create(['foo', '.rb']) do |file|
file.write(<<~RUBY)
Fiber.yield
class C
end
RUBY
file.close
autoload :C, file.path
3.times do |i|
fiber = Fiber.new do
$stderr.puts "#{i} LOADING C"
$stderr.puts C
rescue NameError
$stderr.puts "NameError: #{i}"
raise
end
fiber.resume
end
end
Updated by ioquatix (Samuel Williams) about 2 years ago
/* always on stack, no need to mark */
struct autoload_state {
struct autoload_const *ac;
VALUE result;
VALUE thread;
struct list_head waitq;
};
I'm suspicious of VALUE thread
.
Updated by ioquatix (Samuel Williams) about 2 years ago
Autoload uses a spin lock:
static VALUE
autoload_sleep(VALUE arg)
{
struct autoload_state *state = (struct autoload_state *)arg;
/*
* autoload_reset in other thread will resume us and remove us
* from the waitq list
*/
do {
rb_thread_sleep_deadly();
} while (state->thread != Qfalse);
return Qfalse;
}
It basically spins on state->thread
.
I hesitate to call it poorly implemented, but it doesn’t look great. I don’t fully understand the implementation yet.
Updated by ioquatix (Samuel Williams) about 2 years ago
@fxn (Xavier Noria) If I understand correctly, autoload is mostly a feature of a development environment, so if we made this a little bit slower in order to improve accuracy (i.e. using a proper mutex), it wouldn't be huge loss right? Ruby's mutex isn't that slow but I realise if we took this route, it would introduce some overhead.
Updated by fxn (Xavier Noria) about 2 years ago
@ioquatix (Samuel Williams) hmmm, let me explain.
The feature in development for a web application is reloading. Ordinary gems using Zeitwerk may autoload in any project they are used, and when you eager load Rails in production you may need to autoload top-level constants like superclasses or mixins while eager loading, for example.
However, as the say goes, I can be fast if I can be wrong :). I'd go with that mutex, probably not a big deal, and fibers are key for the future of Ruby, so better be compatible in my view.
Updated by ioquatix (Samuel Williams) about 2 years ago
I've tracked down the root of this bug, being that it's not yielding to the fiber scheduler and implements it's own condition variable like semantics. I'll propose a fix.
Updated by Eregon (Benoit Daloze) about 2 years ago
- Has duplicate Misc #18662: Fiber scheduling and Module#autoload added
Updated by ioquatix (Samuel Williams) almost 2 years ago
https://github.com/ruby/ruby/pull/5788 fixes this issue.
I've confirmed that my PR fixes the given examples here.
There is a tiny bit of extra overhead; using a mutex has an object allocation, mutex lock and unlock, etc.
A light weight concurrency construct rb_condition
or rb_fiber_condition
that behaves like a "pessimistic mutex", i.e. avoids the allocation until it's actually needed might be a better solution but let's get it correct first, and we can optimise it later.
Just a few notes:
struct rb_condition {
... waitq;
}
VALUE rb_condition_wait(struct rb_condition *condition, VALUE timeout);
rb_condition_signal(struct rb_condition *condition, VALUE result); // wake up all waiters
Updated by nagachika (Tomoyuki Chikanaga) almost 2 years ago
- Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED
Updated by ioquatix (Samuel Williams) almost 2 years ago
- Status changed from Open to Closed
I've merged the fix.