Bug #8523

intermittent unit test failure in test_timeout.rb results in build failures

Added by Cyle Riggs 10 months ago.

[ruby-core:<unknown>]
Status:Open
Priority:High
Assignee:-
Category:test
Target version:-
ruby -v:ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux] Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

This bug was found in 1.9.3-p392 from fedora. About 20% of our koji builds failed with unit test failures originating in test_timeout.rb.

The following build log shows this problem:

54) Failure:
testtimeout(TestTimeout) [/builddir/ruby19/srpm/ruby-1.9.3-p392/test/testtimeout.rb:19]:
.
Exception raised:
<#>.
10359 tests, 2235326 assertions, 1 failures, 0 errors, 53 skips
make[1]: *** [yes-test-all] Error 1
make[1]: Leaving directory `/builddir/ruby19/srpm/ruby-1.9.3-p392'
error: Bad exit status from /var/tmp/rpm-tmp.C2OW6z (%check)
RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.C2OW6z (%check)

make: *** [rpmbuild] Error 1

I expected the unit test to pass consistently, but found that it intermittently failed, causing builds to fail. The build failures that occur due to this unit test failure within koji were reproducible on my local machine as well by performing an rpmbuild.

I have identified the problem in the unit test TestTimeout and supplied a patch.

A description of the flawed unit test logic follows:
1. A flag, @flag is set to true.
2. A background thread is started which waits 0.1 seconds and then sets @flag to false.
3. A block of code which busy waits for @flag to become false is executed within the timeout module, which mandates that @flag becomes false in 1 second or less.
4. An assertion is made that the timeout module will not raise an exception of any kind and that the background thread stops the code block before the timeout module does.

The unit test fails when the background thread, which is responsible for setting @flag to false does not set the value within the expected amount of time and the timeout module, working properly, raises a Timeout::Error due to the code block monitoring the @flag value runing for more than 1 second. This design of the unit test is too sensitive to thread scheduling issues, and routinely fails despite the timeout module working properly.

My supplied patch inverts the design of the unit test, such that the background thread operates merely as a secondary method of preventing the unit test from running forever in the case that the timeout module does stop working properly. With my patch the flow will now be:
1. A flag, @flag is set to true.
2. A background thread is started which waits 2 seconds and then sets @flag to false.
3. A code block of "nil while @flag" is ran within the timeout module, with a specified maximum execution time of 0.1 seconds.
4. An assertion is made that the timeout module must stop the code block with a Timeout::Error before the background thread does.

After applying my patch I was able to execute 30 builds sequentially and experienced no failures.

fix-test_timeout.patch Magnifier (660 Bytes) Cyle Riggs, 06/13/2013 08:21 AM

Also available in: Atom PDF