Bug #19976
closedtest/fiber/test_queue.rb stuck tests in Ubuntu ppc64le
Description
I have seen the test_pop_with_timeout
and test_pop_with_timeout_and_value
are stuck/hang on GCC compilers on ruby's master branch in RubyCI ppc64le server Focal/Jammy and Travis CI ppc64le.
This ticket is to manage this issue.
On August 27 2023, we saw the following stuck/hang issue in Travis ppc64le Ubuntu focal. The used GCC version was 10.5.0.
https://app.travis-ci.com/github/ruby/ruby/jobs/608696247#L2355
Retrying hung up testcases...
[1/2] TestFiberQueue#test_pop_with_timeout_and_value = 0.00 s
[2/2] TestFiberQueue#test_pop_with_timeout
====[ 540 seconds still running ]====
====[ 1080 seconds still running ]====
====[ 1620 seconds still running ]====
====[ 2160 seconds still running ]====
We upgraded RubyCI's ppc64le server from focal to jammy, and started to use newer GCC 11.4.0 (gcc-11) on the server.
https://packages.ubuntu.com/jammy-updates/gcc-11 - 11.4.0
We have not seen the issue in the server after starting using the gcc 11.4.0.
http://rubyci.s3.amazonaws.com/ppc64le/ruby-master/recent.html
However, I saw this issue on October 27 2023 again in Travis Ubuntu ppc64le jammy when I tried to upgrade Travis ppc64le from focal to jammy. I didn't seen the issue in Travis ppc64le focal. The used gcc version is also 11.4.0.
https://github.com/ruby/ruby/pull/8739
https://app.travis-ci.com/github/junaruga/ruby/jobs/612361931#L2930
[1/2] TestFiberQueue#test_pop_with_timeout====[ 1080 seconds still running ]====
====[ 1620 seconds still running ]====
====[ 2160 seconds still running ]====
This means something is different between RubyCI ppc64le server and Travis ppc64le environments for running the tests.
I was able to reproduce this stuck/hang issue with the reproducing script below in RubyCI's ppc64le Ubuntu jammy server.
https://github.com/junaruga/report-ruby-fiber-hung_up-tests
The possible differences that may cause the issue is a parallel execution make -jN
, or compiler flag-O1
, or -ggdb3
.
https://github.com/junaruga/report-ruby-fiber-hung_up-tests/blob/d94205d9d7ff6c437d5ab531c1cfb0c3d523d5d2/test.sh#L5-L12
I also sent the PR to make the stuck/hang tests fail, and it was merged. The tests failing immediately is better than the tests being stuck/hang.
https://github.com/ruby/ruby/pull/8791
I hope we find the cause and fix this stuck/hang issue in gcc 11.4.0 in Ubuntu jammy.
Updated by jaruga (Jun Aruga) 6 months ago
It seems that this issue was gone by removing the optflags=-O1
in Travis ppc64le on this PR's 2nd commit https://github.com/ruby/ruby/commit/ca7296767b5db9a401bc64738984f35880061a73 .
Updated by jeremyevans0 (Jeremy Evans) 6 months ago
- Status changed from Open to Closed