Bug #18731
closedParallel test-all sometimes does not run at all some tests
Description
In TruffleRuby I've noticed that some CRuby tests sometimes run or not, non-deterministically.
The TruffleRuby CI currently runs CRuby tests with -j4
.
Today I investigated and I think I've found the reason for it.
One occurrence of this bug is for:
- https://github.com/ruby/ruby/blob/master/test/ruby/test_method.rb
- https://github.com/ruby/ruby/blob/master/test/ruby/test_inlinecache.rb
Both define a test class TestMethod
.
The parallel runner distributes files across processes.
If the same worker process gets both files, then all tests in the second file are not run at all! (the file is still loaded).
The reason seems this line:
https://github.com/ruby/ruby/blob/8751c5c2672d1391c73d9dec590063d27bed7e4c/tool/lib/test/unit/parallel.rb#L128
(it's also possible to reproduce with just these 2 files, -j2
and not the full test suite, and having one worker start very slowly so the same worker gets both files)
Test::Unit::TestCase.test_suites
is an array of classes, and so here because TestMethod
was already defined by the first file loaded, the result of Test::Unit::TestCase.test_suites-suites
is empty, and nothing gets run.
This doesn't seem an issue when not running in parallel/using -j
.
But that's rare/impractical because test-all is very slow without -j
.
For this specific case it seems we should rename the class in test_inlinecache.rb
, but I suspect there are more name conflicts and the parallel runner should be fixed to handle this.
<rant>
FWIW, this test/unit code seems pretty messy, very long lines, duplication, hard to follow with the mix of minitest/test-unit, etc. I don't understand how this code is any better than mspec
. It seems far more complex and hacky.
And that's probably why Ruby implementations except CRuby run MRI tests only when they have no choice, it's so annoying to work with, so many unnecessary subprocesses (very slow to run, annoying to debug), many CRuby-specific tests, so many unreliable tests, many metaprogramming-defined tests which are very brittle, lots of global state and coupling, etc.
</rant>
Updated by Eregon (Benoit Daloze) almost 3 years ago
- Subject changed from Parallel test-all skips some tests to Parallel test-all sometimes does not run at all some tests
Updated by Eregon (Benoit Daloze) almost 3 years ago
Here is a quick search of test classes with the same name in different places:
class Test_Bignum
class TestDateParse
class TestEmojiBreaks
class TestFileUtils
class TestFuncall
class TestGraphemeBreaksFromFile
class TestMethod
class TestMkmf
class TestOptionParser
From ack '^class \w+ < Test::Unit::TestCase' test/mri/tests | grep -o -P 'class \w+' | sort > test.txt
+
puts File.readlines("tests.txt", chomp: true).select { |t| tests.count(t) > 1 }.uniq
Updated by headius (Charles Nutter) almost 3 years ago
Ruby implementations except CRuby run MRI tests only when they have no choice
JRuby has run CRuby tests as part of our regular suite since the mid 2000s and will continue to do so as long as new tests and assertions continue to be added there. We don't really have a choice in the matter if we want to keep up with changes.
That said, I'd definitely love to see the tests cleaned up or migrated into ruby/spec.
Updated by jeremyevans0 (Jeremy Evans) over 2 years ago
I submitted a pull request to make it so the same test class is not used in multiple files, for the classes identified as problems: https://github.com/ruby/ruby/pull/5839
Updated by Eregon (Benoit Daloze) over 2 years ago
Thank you.
I think we should also fix the underlying issue of the parallel runner though.
This is otherwise bound to happen again in the future.
Some ideas:
- Raise if the check
Test::Unit::TestCase.test_suites-suites
(which is the bug) returns an empty Array. It won't catch all issues though when e.g. 2nd test class is defined in the same file. - Use the :class TracePoint instead of the inherited hook which does not trigger for reopening a class.
- Run whatever new methods were added via the
method_added
hook.
Finally there is the question of what could we backport, so that running tests on the 2.6/2.7/3.0/3.1 branch with -j
doesn't run into this bug and actually skips some tests?
Updated by nagachika (Tomoyuki Chikanaga) about 2 years ago
- Status changed from Open to Closed
Applied in changeset git|5d4bfaccabe66d89460739ee682f1f78698c93a3.
merge revision(s) ab3cb29bd9bff9c16cfb9d19cc02026998282c12:
Avoid defining the same test class in multiple files
Should fix issues with parallel testing sometimes not running all
tests.
This should be viewed skipping whitespace changes.
Fixes [Bug #18731]
---
test/-ext-/bignum/test_big2str.rb | 38 +-
test/-ext-/bignum/test_bigzero.rb | 20 +-
test/-ext-/bignum/test_div.rb | 38 +-
test/-ext-/bignum/test_mul.rb | 260 ++++++------
test/-ext-/bignum/test_pack.rb | 653 +++++++++++++++----------------
test/-ext-/bignum/test_str2big.rb | 52 ++-
test/-ext-/funcall/test_funcall.rb | 11 -
test/-ext-/funcall/test_passing_block.rb | 5 +
test/date/test_date_ractor.rb | 2 +-
test/fileutils/clobber.rb | 5 +-
test/fileutils/test_dryrun.rb | 2 +-
test/fileutils/test_nowrite.rb | 2 +-
test/fileutils/test_verbose.rb | 2 +-
test/fileutils/visibility_tests.rb | 5 +-
test/mkmf/base.rb | 225 ++++++-----
test/mkmf/test_config.rb | 16 +-
test/mkmf/test_constant.rb | 56 ++-
test/mkmf/test_convertible.rb | 48 ++-
test/mkmf/test_egrep_cpp.rb | 14 +-
test/mkmf/test_find_executable.rb | 82 ++--
test/mkmf/test_flags.rb | 92 +++--
test/mkmf/test_framework.rb | 70 ++--
test/mkmf/test_have_func.rb | 18 +-
test/mkmf/test_have_library.rb | 84 ++--
test/mkmf/test_have_macro.rb | 46 ++-
test/mkmf/test_install.rb | 38 +-
test/mkmf/test_libs.rb | 156 ++++----
test/mkmf/test_mkmf.rb | 14 +-
test/mkmf/test_pkg_config.rb | 98 +++--
test/mkmf/test_signedness.rb | 38 +-
test/mkmf/test_sizeof.rb | 74 ++--
test/optparse/test_acceptable.rb | 2 +-
test/optparse/test_autoconf.rb | 4 +-
test/optparse/test_bash_completion.rb | 4 +-
test/optparse/test_cclass.rb | 2 +-
test/optparse/test_did_you_mean.rb | 2 +-
test/optparse/test_getopts.rb | 4 +-
test/optparse/test_kwargs.rb | 4 +-
test/optparse/test_noarg.rb | 6 +-
test/optparse/test_optarg.rb | 2 +-
test/optparse/test_placearg.rb | 2 +-
test/optparse/test_reqarg.rb | 10 +-
test/optparse/test_summary.rb | 2 +-
test/optparse/test_zsh_completion.rb | 4 +-
test/ruby/enc/test_emoji_breaks.rb | 207 +++++-----
test/ruby/enc/test_grapheme_breaks.rb | 115 +++---
test/ruby/test_inlinecache.rb | 2 +-
47 files changed, 1280 insertions(+), 1356 deletions(-)
delete mode 100644 test/-ext-/funcall/test_funcall.rb
Updated by luke-gru (Luke Gruber) almost 2 years ago
I've been working a bit on the parallel runner and I've noticed this problem too.
Raise if the check Test::Unit::TestCase.test_suites-suites (which is the bug) returns an empty Array. It won't catch all issues though when e.g. 2nd test class is defined in the same file.
Unfortunately we can't do this because some test classes are guarded by if statements, and the array can be empty in those cases too.
I'll take a look at the method_added
solution, incrementing a number on the class and checking it matches the previous one.
Edit: PR here: https://github.com/ruby/ruby/pull/7395