Bug #4173
closedTestProcess#test_wait_and_sigchild が、たまに失敗する
Description
=begin
ごくたまに、ruby/test_process.rb の TestProcess#test_wait_and_sigchild が失敗します。
- Failure:
test_wait_and_sigchild(TestProcess) [/export/home/ksmakoto/ruby-git/test/ruby/test_process.rb:1194]:
[ruby-core:19744].
<[true]> expected but was
<[]>.
TESTS='-n test_wait_and_sigchild ruby/test_process.rb' のようにしてこのテストだけ実行しても起きます。cpuset -l 0 するとほとんどまったく起きないようです。
手元でビルド可能な最古の trunk の r21509 でも起きました。
=end
Updated by naruse (Yui NARUSE) over 13 years ago
- Status changed from Open to Assigned
- Assignee set to akr (Akira Tanaka)
とりあえず現在FreeBSDではskipにしてます
Updated by naruse (Yui NARUSE) almost 7 years ago
- Assignee deleted (
akr (Akira Tanaka))
Updated by naruse (Yui NARUSE) almost 7 years ago
- Status changed from Assigned to Open
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
I had a bit of a look at this and I don't think it's a problem anymore.
This test was skipped 12 years ago because it was flaky on FreeBSD and OpenBSD. Since then, Ruby's SIGCHLD handling has been substantially re-written (mostly by Eric Wong @normalperson (Eric Wong) in 44fc3d0).
These tests now in fact pass reliably on Ruby master on FreeBSD 13.2 and OpenBSD 7.3. I stress-tested the test_wait_and_sigchild test on my laptop by running four copies of the test in a loop on a 8-core VM; both by itself and also as part of the whole test_process.rb file. I did not see any failures.
I think we should merge this PR I opened to un-skip this test (https://github.com/ruby/ruby/pull/7809) and then close this bug. I'll keep an eye on Ruby CI and see if this causes any flakiness in the process tests, but fingers crossed I think these tests are fine on BSD these days.
I think now is actually a good time to make this change - I noticed that @ioquatix (Samuel Williams) tried to undo some of @normalperson's changes around SIGCHLD/waitpid (since they're no longer needed for MJIT) in https://github.com/ruby/ruby/pull/7482 & https://github.com/ruby/ruby/pull/7476, but they had to be reverted in https://github.com/ruby/ruby/pull/7517. If someone makes another attempt at this, it would be good to know that this SIGCHLD test continues to work on Free/Open BSD.
Updated by ioquatix (Samuel Williams) over 1 year ago
Just to clarify, there was an unexpected issue with my PR, but only a small part of it was reverted due to some unexpected behaviour. I'll try to complete the removal of that functionality soon.
Updated by Anonymous over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|8bd4d8867a0222a3c30a0c7ee1f69b06baa8e91a.
Unskip the test skipped in #4173 (#7809)
This test was skipped 12 years ago because it was flaky on FreeBSD and
OpenBSD. Since then, Ruby's SIGCHLD handling has been substantially
re-written (mostly by Eric Wong @normalperson (Eric Wong) in 44fc3d08).
These tests now in fact pass reliably on Ruby master on FreeBSD 13.2 and
OpenBSD 7.3. I stress-tested the test_wait_and_sigchild test on my
laptop by running four copies of the test in a loop on a 8-core VM; both
by itself and also as part of the whole test_process.rb file. I did not
see any failures.
Let's unskip the test and close [#4173] out. I'll keep an eye out on Ruby
CI for any flakes in this file on BSD after this gets merged, but if we
don't see any I'm going to assume 44fc3d08 or related changes around
that time accidently fixed this bug.
It's also probably important to unskip this test so that if another
attempt at removing the special SIGCHLD handling is made (like was
reverted in https://github.com/ruby/ruby/pull/7517), we get signal if
that breaks on FreeBSD/OpenBSD.
[Fixes #4173]
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
So I noticed the FreeBSD process tests failed a few times - looking at it today/tomorrow. I also actually managed to reproduce this one on my machine once - http://rubyci.s3.amazonaws.com/freebsd13/ruby-master/log/20230515T183001Z.fail.html.gz