Project

General

Profile

Actions

Bug #19976

closed

test/fiber/test_queue.rb stuck tests in Ubuntu ppc64le

Added by jaruga (Jun Aruga) 6 months ago. Updated 6 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:115187]

Description

I have seen the test_pop_with_timeout and test_pop_with_timeout_and_value are stuck/hang on GCC compilers on ruby's master branch in RubyCI ppc64le server Focal/Jammy and Travis CI ppc64le.
This ticket is to manage this issue.

On August 27 2023, we saw the following stuck/hang issue in Travis ppc64le Ubuntu focal. The used GCC version was 10.5.0.
https://app.travis-ci.com/github/ruby/ruby/jobs/608696247#L2355

Retrying hung up testcases...
[1/2] TestFiberQueue#test_pop_with_timeout_and_value = 0.00 s
[2/2] TestFiberQueue#test_pop_with_timeout
====[ 540 seconds still running ]====
====[ 1080 seconds still running ]====
====[ 1620 seconds still running ]====
====[ 2160 seconds still running ]====

We upgraded RubyCI's ppc64le server from focal to jammy, and started to use newer GCC 11.4.0 (gcc-11) on the server.
https://packages.ubuntu.com/jammy-updates/gcc-11 - 11.4.0

We have not seen the issue in the server after starting using the gcc 11.4.0.
http://rubyci.s3.amazonaws.com/ppc64le/ruby-master/recent.html

However, I saw this issue on October 27 2023 again in Travis Ubuntu ppc64le jammy when I tried to upgrade Travis ppc64le from focal to jammy. I didn't seen the issue in Travis ppc64le focal. The used gcc version is also 11.4.0.

https://github.com/ruby/ruby/pull/8739
https://app.travis-ci.com/github/junaruga/ruby/jobs/612361931#L2930

[1/2] TestFiberQueue#test_pop_with_timeout====[ 1080 seconds still running ]====
====[ 1620 seconds still running ]====
====[ 2160 seconds still running ]====

This means something is different between RubyCI ppc64le server and Travis ppc64le environments for running the tests.

I was able to reproduce this stuck/hang issue with the reproducing script below in RubyCI's ppc64le Ubuntu jammy server.
https://github.com/junaruga/report-ruby-fiber-hung_up-tests

The possible differences that may cause the issue is a parallel execution make -jN, or compiler flag-O1, or -ggdb3.
https://github.com/junaruga/report-ruby-fiber-hung_up-tests/blob/d94205d9d7ff6c437d5ab531c1cfb0c3d523d5d2/test.sh#L5-L12

I also sent the PR to make the stuck/hang tests fail, and it was merged. The tests failing immediately is better than the tests being stuck/hang.
https://github.com/ruby/ruby/pull/8791

I hope we find the cause and fix this stuck/hang issue in gcc 11.4.0 in Ubuntu jammy.

Updated by jaruga (Jun Aruga) 6 months ago

It seems that this issue was gone by removing the optflags=-O1 in Travis ppc64le on this PR's 2nd commit https://github.com/ruby/ruby/commit/ca7296767b5db9a401bc64738984f35880061a73 .

Actions #2

Updated by jeremyevans0 (Jeremy Evans) 6 months ago

  • Status changed from Open to Closed
Actions

Also available in: Atom PDF

Like0
Like0Like0