Project

General

Profile

Actions

Misc #20013

open

Travis CI status

Added by jaruga (Jun Aruga) 6 months ago. Updated 2 months ago.

Status:
Open
Assignee:
-
[ruby-core:115438]

Description

I would like to use this ticket to manage our activities to report Travis CI status.

Because there is Travis CI status page provided by Travis CI. However, even when the page shows ok, I actually see infra issues.
https://www.traviscistatus.com/

I would share my activities and report the Travis CI status on the ticket.
The ticket's status is not closed until we stop using Travis CI.

The easiest option to fix the Travis infra issue is to email Travis CI support support _AT_ travis-ci.com.

You can check this ruby/ruby Travis CI wiki page for details.


Related issues 1 (1 open0 closed)

Related to Ruby master - Misc #20320: Using OSU Open Source Lab native ppc64le/s390x CI services trigged on pull-requestsOpenActions

Updated by jaruga (Jun Aruga) 6 months ago

I am seeing that Travis s390x builds are not starting right now. I am asking to fix it by emailing Travis CI customer support.

https://app.travis-ci.com/github/ruby/ruby/builds/267381855
https://app.travis-ci.com/github/ruby/ruby/builds/267383404

Updated by jaruga (Jun Aruga) 6 months ago

It seems that s390x build takes time to start. But the builds are still running.
https://app.travis-ci.com/github/ruby/ruby/builds

Updated by jaruga (Jun Aruga) 6 months ago

I asked Travis CI support about the s390x build issue yesterday. The support replied that they are investigating the issue now.

Updated by jaruga (Jun Aruga) 6 months ago

I will enable allow_failures for s390x. I am sorry for that.
https://github.com/ruby/ruby/pull/8997

Updated by jaruga (Jun Aruga) 6 months ago

I see the following infra is colored as yellow (not green).

https://www.traviscistatus.com/

Pusher Webhooks - Degraded Performance

Updated by jaruga (Jun Aruga) 6 months ago

I will drop the s390x temporarily. I guess that there are maximum queue number in Travis CI. And as s390x builds are in the queue, other CPU architecture builds (arm64, arm32, ppc64le) even don't start.
https://github.com/ruby/ruby/pull/9004

Updated by jaruga (Jun Aruga) 6 months ago

I am canceling the s390x builds manually for the running Travis builds.

Updated by jaruga (Jun Aruga) 6 months ago

I can see Travis CI builds are stable except for s390x.
https://app.travis-ci.com/github/ruby/ruby/builds

I am communicating with Travis CI support. It seems they added "IBM Z Builds" in Build Processing on Travis CI status page. And you see the status is Degraded Performance (yellow color). It's really helpful! I am asking them to add the "Arm builds" / "IBM ppc64le builds" on the page too.
https://www.traviscistatus.com/

Updated by jaruga (Jun Aruga) 6 months ago

I am testing the s390x builds on my forked repository to add it again.
https://github.com/ruby/ruby/pull/9024

Updated by jaruga (Jun Aruga) 6 months ago

I tested the PR to add the s390x on my forked repository, and merged it. Now Travis CI has the s390x pipeline again.

Updated by jaruga (Jun Aruga) 6 months ago

I was told by Travis customer support that their infra team resolved the issue with s390x builds, and the builds should work now.

Updated by jaruga (Jun Aruga) 5 months ago

Now I am asking Travis CI support by emailing them about the following error messages which are printed in only Arm64 pipelines, and it seems not affected to the result of the CI tests.

https://app.travis-ci.com/github/ruby/ruby/jobs/615194806#L6

sudo: unable to resolve host travis-job-ruby-ruby-615194806: Name or service not known

I opened the thread about the issue in the end of the October 2023, but I haven't seen the response there.
https://travis-ci.community/t/arm64-sudo-unable-to-resolve-host-name-or-service-not-known/14028

So, I emailed them today, and then I was told that the support has reached out to the Travis infra team. I will let you know here when I have updates.

Updated by jaruga (Jun Aruga) 4 months ago

It seems that Travis s390x is slow, running out the max 50 minutes (ruby_3_3 specific issue?),
https://app.travis-ci.com/github/ruby/ruby/builds/268615249

Or not starting soon.
https://app.travis-ci.com/github/ruby/ruby/builds/268616415

I am contacting Travis CI support.

Updated by jaruga (Jun Aruga) 4 months ago

I will drop the s390x case in Travis CI temporarily. I am not sure that the issue comes from an infra or Ruby. But right now the test failing with 50 minutes is not convenient as a CI.
https://github.com/ruby/ruby/pull/9758

Updated by jaruga (Jun Aruga) 4 months ago

jaruga (Jun Aruga) wrote in #note-14:

I will drop the s390x case in Travis CI temporarily. I am not sure that the issue comes from an infra or Ruby. But right now the test failing with 50 minutes is not convenient as a CI.
https://github.com/ruby/ruby/pull/9758

I got a message from Travis CI support "Our Infra team has deployed a fix for the issue you encountered with the s390x Build environment."
Now I am testing the Travis s390x on my forked repository.

Updated by jaruga (Jun Aruga) 4 months ago

Now I am testing the Travis s390x on my forked repository.

I tested. I sent a PR to add the s390x again.
https://github.com/ruby/ruby/pull/9773

Updated by jaruga (Jun Aruga) 4 months ago

jaruga (Jun Aruga) wrote in #note-16:

Now I am testing the Travis s390x on my forked repository.

I tested. I sent a PR to add the s390x again.
https://github.com/ruby/ruby/pull/9773

Merged. The s390x is added on Travis again.

Updated by jaruga (Jun Aruga) 3 months ago

It seems some s390x builds are not starting after 3 hours now. I am asking Travis customer support.

https://app.travis-ci.com/github/ruby/ruby/builds
https://app.travis-ci.com/github/ruby/ruby/builds/269093276
https://app.travis-ci.com/github/ruby/ruby/builds/269093679

https://www.traviscistatus.com/ - Builds Processing - IBM Z Builds shows operational (green).

Updated by jaruga (Jun Aruga) 3 months ago

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

Updated by jaruga (Jun Aruga) 3 months ago

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

Today I found another s390x build exceeding the maximum timeout 50 minutes. Interestingly it took the make test-all build time for 588 seconds (= about 9 minutes). That is normal.
https://app.travis-ci.com/github/ruby/ruby/jobs/618265449#L2095
But it seems that a freezing happened in the step of the make test-spec.
https://app.travis-ci.com/github/ruby/ruby/jobs/618265449#L3079

Updated by jaruga (Jun Aruga) 3 months ago

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

I was told from the Travis support that the Travis's engineers were able to check this issue by their message below.

Thanks so much for your patience here.

Our engineers were able to check on this and you should be able to see your builds are now running. Very sorry for the trouble and we will continue to monitor this!

Updated by Eregon (Benoit Daloze) 3 months ago

FYI mspec has a --timeout SECONDS option, which should help identify which spec is hanging/very slow.

Updated by jaruga (Jun Aruga) 3 months ago

Eregon (Benoit Daloze) wrote in #note-22:

FYI mspec has a --timeout SECONDS option, which should help identify which spec is hanging/very slow.

OK. Thanks for the tip!

Updated by jaruga (Jun Aruga) 3 months ago

We have observed unstable Travis ppc64le/s390x pipelines. So, I added the allow_failures to the pipelines by the PR https://github.com/ruby/ruby/pull/10158.

ppc64le

We have seen the following errors around 10 or more times in latest 1 or 2 days.

s390x

The following error happened without any output.

Updated by jaruga (Jun Aruga) 3 months ago

I found the following information on https://www.traviscistatus.com/ . Travis CI is undergoing a maintenance in a week of 27/Feb - 5/Mar.

Back-end maintenance 27-Feb to 5-Mar

Update - Build status on GitHub works. Builds triggered from GitLab, BitBucket and Assembla operational. Next updates on Feb-29.
Feb 28, 2024 - 12:42 UTC

Update - Be advised: Build statsues are not passed back to GitHub after build is executed. Triggering builds from GitLab, BitBucket and Assembla not available. We are in progress with maintenance activities.
Feb 28, 2024 - 11:36 UTC

In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 27, 2024 - 08:00 UTC

Update - Reminder: Travis CI will be undergoing a maintenance in a week of 27/Feb - 5/Mar. There may be intermittent service detoration, particularly on Feb 28th.
Feb 27, 2024 08:00 - Mar 5, 2024 08:00 UTC

Scheduled - Travis CI will be undergoing a maintenance in a week of 27/Feb - 5/Mar. We will do all that we can to not interrupt the service during this period. If you spot erratic or deteriorated service behavior please report back to our support.
Feb 27, 2024 08:00 - Mar 5, 2024 08:00 UTC

Actions #26

Updated by jaruga (Jun Aruga) 3 months ago

  • Related to Misc #20320: Using OSU Open Source Lab native ppc64le/s390x CI services trigged on pull-requests added

Updated by jaruga (Jun Aruga) 3 months ago

jaruga (Jun Aruga) wrote in #note-21:

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

For the slow s390x build issue, I received the following reply from Travis support on 1st March 2024.

Our Infra team has resolved the issue you encountered. In case it resurfaces, please reach back and we will gladly help.

Updated by jaruga (Jun Aruga) 3 months ago

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. I hope ideally Travis will maintain their service without stopping their service.

https://app.travis-ci.com/github/ruby/ruby

Please note: Travis CI is undergoing maintenance. On March 6 , between 08:00-12:00 UTC+0 service may be temporarily unavailable.

Updated by jaruga (Jun Aruga) 3 months ago

jaruga (Jun Aruga) wrote in #note-28:

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. ...

I sent the PR for that.
https://github.com/ruby/ruby/pull/10180

Updated by jaruga (Jun Aruga) 3 months ago

jaruga (Jun Aruga) wrote in #note-29:

jaruga (Jun Aruga) wrote in #note-28:

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. ...

I sent the PR for that.
https://github.com/ruby/ruby/pull/10180

As it seems that the maintenance is finished, I reverted the commit above.
https://github.com/ruby/ruby/pull/10186

Updated by jaruga (Jun Aruga) 2 months ago ยท Edited

For your information, I saw the following ppc64le job not starting 10 days ago, and contacted Travis support at that time, and still waiting for the fix, though I didn't find any other failures in last few days.

https://app.travis-ci.com/github/ruby/ruby/jobs/619005133

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
The build has been terminated

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0