Bug #18476
openCall to require stuck forever after receiving EAGAIN on writev when running with zeus
Description
Environment¶
I'm using Ubuntu 18.04 running on Windows using WSL2:
$ uname -a
Linux myhostname 5.10.60.1-microsoft-standard-WSL2 #1 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
Problem description¶
When using zeus
to run a ruby project in an effort to cache dependency loading it gets stuck when calling require 'rubocop'
on Ruby 3.0 and 3.1. Everything works fine on Ruby 2.6.
When looking at strace output of the stuck process it shows a writev
call that receives an EAGAIN
right before the process gets to its stuck state:
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/number_conversion.rb", iov_len=115}, {iov_base="\n", iov_len=1}], 2) = 116
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/numbered_parameter_assignment.rb", iov_len=127}, {iov_base="\n", iov_len=1}], 2) = 128
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/or_assignment_to_constant.rb", iov_len=123}, {iov_base="\n", iov_len=1}], 2) = 124
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/ordered_magic_comments.rb", iov_len=120}, {iov_base="\n", iov_len=1}], 2) = 121
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/out_of_range_regexp_ref.rb", iov_len=121}, {iov_base="\n", iov_len=1}], 2) = 122
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/parentheses_as_grouped_expression"..., iov_len=131}, {iov_base="\n", iov_len=1}], 2) = 132
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/percent_string_array.rb", iov_len=118}, {iov_base="\n", iov_len=1}], 2) = 119
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/percent_symbol_array.rb", iov_len=118}, {iov_base="\n", iov_len=1}], 2) = 119
[pid 24872] writev(14, [{iov_base="/home/jelte/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/rubocop-1.23.0/lib/rubocop/cop/lint/raise_exception.rb", iov_len=113}, {iov_base="\n", iov_len=1}], 2) = -1 EAGAIN (Resource temporarily unavailable)
[pid 24872] getpid() = 24872
[pid 24872] ppoll([{fd=14, events=POLLOUT}, {fd=7, events=POLLIN}], 2, NULL, NULL, 8 <unfinished ...>
[pid 24859] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 24859] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=20}) = 0 (Timeout)
[pid 24859] futex(0x605f38, FUTEX_WAIT, 0, {tv_sec=60, tv_nsec=0}
The exact file at which the process receives the EAGAIN and is stuck is consistent across runs with the same Ruby version (at least on my machine). For 3.1.0 it's raise_exception.rb
and for 3.0.3 it's redundant_dir_glob_sort.rb
For reference this is the link to the repo of zeus
: https://github.com/burke/zeus
How to reproduce¶
See this example repo for minimal steps on how to reproduce this: https://github.com/JelteF/ruby-zeus-bug
Expectation¶
For Ruby to not get stuck after receiving EAGAIN
on the writev
call.
Updated by JelteF (Jelte Fennema) almost 3 years ago
I looked into this a bit more. And I'm pretty sure this is related to the introduction of the fiber scheduler.