Misc #16360
closedEnabling IBM PowerPC/Z cases in Travis CI
Description
We added arm64-linux
and arm32-linux
cases to Travis CI by the ticket.
The arm32-linux
case is going to be stable after this pull-request will be merged.
So, I would like to talk about this topic.
Currently Travis CI has following 4 multiple CPU architectures cases.
-
x86_64-linux
(Intel, 64-bit, Little-endian) -
arm64-linux
(ARM, 64-bit, Little-endian) -
i686-linux
(Intel, 32-bit, Little-endian) -
arm32-linux
(ARM, 32-bit, Little-endian)
And a exciting news came from Travis CI this month.
Now Travis supports arch: ppc64le
and arch: s390x
as arch: arm64
as well.
Build your open source projects on IBM Power and IBM Z CPU architecture
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z
So, how do you think about adding following 2 cases to Travis CI too?
-
ppc64le-linux
(IBM PowerPC, 64-bit, Little-endian) -
s390x-linux
(IBM Z/Linux One, 64-bit, Big-endian)
ppc64le, s390x use cases in Ruby project¶
- Searching tickets in Redmine, there were some architecture specific issues in the past.
- https://rubyci.org/ has s390x. But it seems it does not have ppc64le.
- s390x is a big-endian. It looks good to check the big-endian specific issue.
ppc64le, s390x use cases in Linux distributions¶
For example Ubuntu is supporting ppc64le, s390x, providing the container image.
https://hub.docker.com/_/ubuntu
Supported architectures: (more info)
amd64, arm32v7, arm64v8, i386, ppc64le, s390x
Fedora project is supporting ppc64le, s390x too.
https://hub.docker.com/_/fedora
Supported architectures: (more info)
amd64, arm32v7, arm64v8, ppc64le, s390x
Are you interested in adding the ppc64le and s390x test cases to Travis CI?
Updated by shyouhei (Shyouhei Urabe) about 5 years ago
Hello. It is definitely a good idea to enhance our CI. But that alone does not improve the code quality. We need someone to fix issues.
Maybe we need a platform maintainer first.
Updated by jaruga (Jun Aruga) about 5 years ago
shyouhei (Shyouhei Urabe) wrote:
Hello. It is definitely a good idea to enhance our CI. But that alone does not improve the code quality. We need someone to fix issues.
Maybe we need a platform maintainer first.
Hello. I agree with it.
Someone, do you have any idea to find the platform maintainers to help us?
Maybe like this?
- arm64 (+ arm32): person A
- ppc64le: person B
- s390x: person C
Updated by naruse (Yui NARUSE) about 5 years ago
Rei Odaira will work as best effort for both ppc64le and s390x.
https://twitter.com/ReiOdaira/status/1202383090611556353
So it's interesting to add them to CI.
Updated by ReiOdaira (Rei Odaira) about 5 years ago
I'm happy to work for ppc64le and s390x. In the last few years, the number of the platform-specific issues that showed up in ppc64le and s390x Ruby has been between 5 and 10 every year, so I assume the same pace for my obligation as a maintainer.
Updated by jaruga (Jun Aruga) about 5 years ago
Thank you, Rei Odaira. I appreciate your work.
I can work to send a pull-request to add ppc64le and s390x cases to (Travis) CI.
However it's up to you. Feel free to take the task.
Updated by jaruga (Jun Aruga) about 5 years ago
I found someone's pull-request adding s390x in Travis CI, and the result (Travis) is succeeded. Someone could you check and merge this?
Adding s390x support for Travis build
https://github.com/ruby/ruby/pull/2727
Updated by jaruga (Jun Aruga) about 5 years ago
As I faced stack level too deep (SystemStackError)
in socket.rb:897
in ip_address_list
on only the Travis ppc64le environment, I am debugging it with strace
command on my forked repository enabling Travis ppc64le here.
https://github.com/junaruga/ruby/commits/feature/ppc64le
https://travis-ci.org/junaruga/ruby/builds/621767234
$ $SETARCH make -s test-spec MSPECOPT=-ff
...
<Thread:0x000009432a9792f0@/home/travis/build/junaruga/ruby/spec/mspec/lib/mspec/matchers/block_caller.rb:3 run> terminated with exception (report_on_exception is true):
/home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `ip_address_list': stack level too deep (SystemStackError)
from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `udp_server_sockets'
from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:1027:in `udp_server_loop'
from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:7:in `block (4 levels) in <top (required)>'
from /home/travis/build/junaruga/ruby/spec/mspec/lib/mspec/matchers/block_caller.rb:4:in `block in matches?'
#<Thread:0x000009432a971a00@/home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:24 run> terminated with exception (report_on_exception is true):
/home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `ip_address_list': stack level too deep (SystemStackError)
from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `udp_server_sockets'
from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/fixtures/classes.rb:149:in `udp_server_sockets'
from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:1027:in `udp_server_loop'
from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:25:in `block (4 levels) in <top (required)>'
getsockname
's pid=-1334654655
looks weird at https://travis-ci.org/junaruga/ruby/jobs/621767236#L16525
$ strace -f $SETARCH make -s test-spec MSPECOPT=-ff SPECOPTS="../spec/ruby/library/socket/socket/udp_server_loop_spec.rb"
...
[pid 17414] <... getsockname resumed> {sa_family=AF_NETLINK, pid=-1334654655, groups=00000000}, [12]) = 0
[pid 17413] <... read resumed> 0x7ffffec81540, 8) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17414] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fb4e1ded7c0} ---
..
Could you help it?
Thank you.
Updated by mame (Yusuke Endoh) about 5 years ago
Hi @jaruga (Jun Aruga) @ReiOdaira (Rei Odaira)
Unfortunately, s390x Travis CI fails too frequently.
https://travis-ci.org/ruby/ruby/jobs/623537864
https://travis-ci.org/ruby/ruby/jobs/623487659
https://travis-ci.org/ruby/ruby/jobs/623482529
https://travis-ci.org/ruby/ruby/jobs/623480228
https://travis-ci.org/ruby/ruby/jobs/623480013
$ tool/travis_retry.sh sudo -E apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
+ sudo -E apt-add-repository -y ppa:ubuntu-toolchain-r/test
Error: retrieving gpg key timed out.
I think it is not your fault but maybe a network setting issue or something in Travis CI. But anyway, frequent failures make it difficult to keep trunk sound. Could you contact on Travis CI and fix the issue?
Updated by jaruga (Jun Aruga) about 5 years ago
Hi jaruga (Jun Aruga) ReiOdaira (Rei Odaira)
Unfortunately, s390x Travis CI fails too frequently.
...
I think it is not your fault but maybe a network setting issue or something in Travis CI.
But anyway, frequent failures make it difficult to keep trunk sound. Could you contact on Travis CI and fix the issue?
Hi mame (Yusuke Endoh).
Sure, I will take a look at the code, contact if it's needed and fix.
I think you can "s390x-linux" case to allow_failures in Travis at the moment as a temporary workflow. Then after making sure that the case is stable, we can remove it from allow_failures.
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Open to Closed
Applied in changeset git|47a365dd580f2dfe0f0d56155587dfdf2fc7afb7.
Move s390x-linux to allow_failures matrix
ref [Misc #16360]
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Closed to Open
Okay, moved to allow_failures.
Updated by jaruga (Jun Aruga) about 5 years ago
You see that "s390x-linux" case was disappeared on the latest master branch Travis now.
https://travis-ci.org/ruby/ruby/builds/623632919
I think it was a wrong way.
The modification to add s390x-linux case to allow_failures could be like this from the latest master branch's .travis.yml
.
$ git diff
diff --git a/.travis.yml b/.travis.yml
index 5d1a8822cf..71945e349e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -418,6 +418,7 @@ matrix:
- <<: *arm64-linux
- <<: *i686-linux
- <<: *arm32-linux
+ - <<: *s390x-linux
- <<: *pedanticism
- <<: *assertions
- <<: *baseruby
@@ -438,10 +439,10 @@ matrix:
- <<: *CALL_THREADED_CODE
- <<: *NO_THREADED_CODE
allow_failures:
+ - name: s390x-linux
- name: -fsanitize=address
- name: -fsanitize=memory
- name: -fsanitize=undefined
- - <<: *s390x-linux
fast_finish: true
before_script:
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Open to Closed
Applied in changeset git|cae657c32492a9b4e72b5e290c143e2c84d4c42d.
Fix .travis.yml to keep s390x-linux as allow_features
[Misc #16360]
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Closed to Open
Updated by mame (Yusuke Endoh) about 5 years ago
Oops sorry! Applied your patch, thanks!
Updated by jaruga (Jun Aruga) almost 5 years ago
Sure, I will take a look at the code, contact if it's needed and fix.
Hi, possibly I found the cause of this error for apt-add-repository
command that happens for only Travis s390x environment, and how to avoid it.
$ tool/travis_retry.sh sudo -E apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
+ sudo -E apt-add-repository -y ppa:ubuntu-toolchain-r/test
Error: retrieving gpg key timed out.
The cause is by the Travis s390x environment's IPv6 issue.
I reported it to Travis community here.
S390x IPv6 connect system call sometimes returning “Connection timed out” in apt-add-repository command
https://travis-ci.community/t/6719
I checked apt-add-repository
command's behavior with strace
command.
$ sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
...
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8003", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3) = 0
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3)
...
pid 1948] futex(0x3ff88000c14, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {1578070812, 434829000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 1948] write(1, "Error: retrieving gpg key timed "..., 37Error: retrieving gpg key timed out.
) = 37
...
The temporary workflow to pass s390x-linux case is to disable IPv6 for only s390x-linux case like this.
I executed the simple example's s390x case job 10 times continuously, and all job is passed.
- ip a
- sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
- sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
- sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1
- ip a
Later I will send the pull-request to disable IPv6 on only Travis s390x-linux case.
Updated by jaruga (Jun Aruga) almost 5 years ago
Later I will send the pull-request to disable IPv6 on only Travis s390x-linux case.
I sent the pull-request to fix the s390x issue now.
Disable IPv6 on Travis s390x case.
https://github.com/ruby/ruby/pull/2819
Updated by jaruga (Jun Aruga) almost 5 years ago
https://bugs.ruby-lang.org/issues/16360#note-7
I am still debugging for the "stack level too deep (SystemStackError)" issue that happens on only ppc64le.
When I was debugging with following file,
$ cat spec/ruby/library/socket/socket/udp_server_loop_debug3_spec.rb
require_relative '../../../spec_helper'
require 'socket'
describe 'Socket.udp_server_loop debug' do
it 'blocks the caller' do
# socket_block = -> do
# Socket.ip_address_list
# end
socket_block = proc { Socket.ip_address_list }
# This line is okay.
puts "[DEBUG] 1 Socket.ip_address_list"
p Socket.ip_address_list
# This line is error.
# See spec/mspec/lib/mspec/matchers/block_caller.rb
puts "[DEBUG] 2 Socket.ip_address_list with mspec block_caller"
socket_block.should_not block_caller
end
end
running like this.
.travis.yml
- $SETARCH make test-spec MSPECOPT="-ff -V" SPECOPTS="../spec/ruby/library/socket/socket/udp_server_loop_debug3_spec.rb"
The "[DEBUG] 1" p Socket.ip_address_list
is okay.
But "[DEBUG] 2" when proc { Socket.ip_address_list }
is used with mspec block_caller, it's not okay. The result is like this.
https://travis-ci.org/junaruga/ruby/jobs/646473964#L2792
Here is the strace's log.
It might be related to the socket with network device and Thread.
[pid 17528] getsockname(5, {sa_family=AF_NETLINK, pid=17527, groups=00000000}, [12]) = 0
[pid 17528] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7d7fd6d8d620} ---
Rei Odaira, is it possible to run make test-spec SPECOPTS="spec/ruby/library/socket/socket/udp_server_loop_spec.rb"
on your ppc64le environment? Does it work on your environment?
Thanks.
Updated by jaruga (Jun Aruga) almost 5 years ago
S390x IPv6 connect system call sometimes returning “Connection timed out” in apt-add-repository command
https://travis-ci.community/t/6719
For Travis s390x IPv6 issue, I was told the issue was fixed by Travis.
https://travis-ci.community/t/6719/7
So, I sent the PR to enable it again here.
I tested it several times on my forked repository, and it was okay.
https://github.com/ruby/ruby/pull/2970
Updated by jaruga (Jun Aruga) over 3 years ago
- Status changed from Open to Closed
I would close this ticket as both Travis ppc64le and s390x were enabled at https://github.com/ruby/ruby/commit/9d4266fd5555c9c4388e2e8592008d0e0d8ccf21 . We do not see the issue https://bugs.ruby-lang.org/issues/16360#note-7 on Travis ppc64le reported at note-7 any more.
There is another issue #17871 on Travis ppc64le, and we are skipping the tests right now.