Project

General

Profile

Actions

Bug #18731

open

Parallel test-all sometimes does not run at all some tests

Added by Eregon (Benoit Daloze) about 1 month ago. Updated about 1 month ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:<unknown>]

Description

In TruffleRuby I've noticed that some CRuby tests sometimes run or not, non-deterministically.
The TruffleRuby CI currently runs CRuby tests with -j4.

Today I investigated and I think I've found the reason for it.

One occurrence of this bug is for:

Both define a test class TestMethod.

The parallel runner distributes files across processes.

If the same worker process gets both files, then all tests in the second file are not run at all! (the file is still loaded).
The reason seems this line:
https://github.com/ruby/ruby/blob/8751c5c2672d1391c73d9dec590063d27bed7e4c/tool/lib/test/unit/parallel.rb#L128
(it's also possible to reproduce with just these 2 files, -j2 and not the full test suite, and having one worker start very slowly so the same worker gets both files)

Test::Unit::TestCase.test_suites is an array of classes, and so here because TestMethod was already defined by the first file loaded, the result of Test::Unit::TestCase.test_suites-suites is empty, and nothing gets run.

This doesn't seem an issue when not running in parallel/using -j.
But that's rare/impractical because test-all is very slow without -j.

For this specific case it seems we should rename the class in test_inlinecache.rb, but I suspect there are more name conflicts and the parallel runner should be fixed to handle this.


<rant>
FWIW, this test/unit code seems pretty messy, very long lines, duplication, hard to follow with the mix of minitest/test-unit, etc. I don't understand how this code is any better than mspec. It seems far more complex and hacky.
And that's probably why Ruby implementations except CRuby run MRI tests only when they have no choice, it's so annoying to work with, so many unnecessary subprocesses (very slow to run, annoying to debug), many CRuby-specific tests, so many unreliable tests, many metaprogramming-defined tests which are very brittle, lots of global state and coupling, etc.
</rant>

Actions #1

Updated by Eregon (Benoit Daloze) about 1 month ago

  • Subject changed from Parallel test-all skips some tests to Parallel test-all sometimes does not run at all some tests

Updated by Eregon (Benoit Daloze) about 1 month ago

Here is a quick search of test classes with the same name in different places:

class Test_Bignum
class TestDateParse
class TestEmojiBreaks
class TestFileUtils
class TestFuncall
class TestGraphemeBreaksFromFile
class TestMethod
class TestMkmf
class TestOptionParser

From ack '^class \w+ < Test::Unit::TestCase' test/mri/tests | grep -o -P 'class \w+' | sort > test.txt +

puts File.readlines("tests.txt", chomp: true).select { |t| tests.count(t) > 1 }.uniq

Updated by headius (Charles Nutter) about 1 month ago

Ruby implementations except CRuby run MRI tests only when they have no choice

JRuby has run CRuby tests as part of our regular suite since the mid 2000s and will continue to do so as long as new tests and assertions continue to be added there. We don't really have a choice in the matter if we want to keep up with changes.

That said, I'd definitely love to see the tests cleaned up or migrated into ruby/spec.

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

I submitted a pull request to make it so the same test class is not used in multiple files, for the classes identified as problems: https://github.com/ruby/ruby/pull/5839

Updated by Eregon (Benoit Daloze) about 1 month ago

Thank you.

I think we should also fix the underlying issue of the parallel runner though.
This is otherwise bound to happen again in the future.
Some ideas:

  • Raise if the check Test::Unit::TestCase.test_suites-suites (which is the bug) returns an empty Array. It won't catch all issues though when e.g. 2nd test class is defined in the same file.
  • Use the :class TracePoint instead of the inherited hook which does not trigger for reopening a class.
  • Run whatever new methods were added via the method_added hook.

Finally there is the question of what could we backport, so that running tests on the 2.6/2.7/3.0/3.1 branch with -j doesn't run into this bug and actually skips some tests?

Actions

Also available in: Atom PDF