Project

General

Profile

Misc #16360

Enabling IBM PowerPC/Z cases in Travis CI

Added by jaruga (Jun Aruga) 8 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
[ruby-core:95910]

Description

We added arm64-linux and arm32-linux cases to Travis CI by the ticket.
The arm32-linux case is going to be stable after this pull-request will be merged.

So, I would like to talk about this topic.
Currently Travis CI has following 4 multiple CPU architectures cases.

  • x86_64-linux (Intel, 64-bit, Little-endian)
  • arm64-linux (ARM, 64-bit, Little-endian)
  • i686-linux (Intel, 32-bit, Little-endian)
  • arm32-linux (ARM, 32-bit, Little-endian)

And a exciting news came from Travis CI this month.
Now Travis supports arch: ppc64le and arch: s390x as arch: arm64 as well.

Build your open source projects on IBM Power and IBM Z CPU architecture
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

So, how do you think about adding following 2 cases to Travis CI too?

  • ppc64le-linux (IBM PowerPC, 64-bit, Little-endian)
  • s390x-linux (IBM Z/Linux One, 64-bit, Big-endian)

ppc64le, s390x use cases in Ruby project

  • Searching tickets in Redmine, there were some architecture specific issues in the past.
  • https://rubyci.org/ has s390x. But it seems it does not have ppc64le.
  • s390x is a big-endian. It looks good to check the big-endian specific issue.

ppc64le, s390x use cases in Linux distributions

For example Ubuntu is supporting ppc64le, s390x, providing the container image.

https://hub.docker.com/_/ubuntu
Supported architectures: (more info)
amd64, arm32v7, arm64v8, i386, ppc64le, s390x

Fedora project is supporting ppc64le, s390x too.

https://hub.docker.com/_/fedora
Supported architectures: (more info)
amd64, arm32v7, arm64v8, ppc64le, s390x

Are you interested in adding the ppc64le and s390x test cases to Travis CI?

Updated by shyouhei (Shyouhei Urabe) 8 months ago

Hello. It is definitely a good idea to enhance our CI. But that alone does not improve the code quality. We need someone to fix issues.

Maybe we need a platform maintainer first.

Updated by jaruga (Jun Aruga) 7 months ago

shyouhei (Shyouhei Urabe) wrote:

Hello. It is definitely a good idea to enhance our CI. But that alone does not improve the code quality. We need someone to fix issues.

Maybe we need a platform maintainer first.

Hello. I agree with it.

Someone, do you have any idea to find the platform maintainers to help us?

Maybe like this?

  • arm64 (+ arm32): person A
  • ppc64le: person B
  • s390x: person C

Updated by naruse (Yui NARUSE) 7 months ago

Rei Odaira will work as best effort for both ppc64le and s390x.
https://twitter.com/ReiOdaira/status/1202383090611556353

So it's interesting to add them to CI.

Updated by ReiOdaira (Rei Odaira) 7 months ago

I'm happy to work for ppc64le and s390x. In the last few years, the number of the platform-specific issues that showed up in ppc64le and s390x Ruby has been between 5 and 10 every year, so I assume the same pace for my obligation as a maintainer.

Updated by jaruga (Jun Aruga) 7 months ago

Thank you, Rei Odaira. I appreciate your work.

I can work to send a pull-request to add ppc64le and s390x cases to (Travis) CI.
However it's up to you. Feel free to take the task.

Updated by jaruga (Jun Aruga) 7 months ago

I found someone's pull-request adding s390x in Travis CI, and the result (Travis) is succeeded. Someone could you check and merge this?

Adding s390x support for Travis build
https://github.com/ruby/ruby/pull/2727

Updated by jaruga (Jun Aruga) 7 months ago

As I faced stack level too deep (SystemStackError) in socket.rb:897 in ip_address_list on only the Travis ppc64le environment, I am debugging it with strace command on my forked repository enabling Travis ppc64le here.

https://github.com/junaruga/ruby/commits/feature/ppc64le
https://travis-ci.org/junaruga/ruby/builds/621767234

$ $SETARCH make -s test-spec MSPECOPT=-ff
...
<Thread:0x000009432a9792f0@/home/travis/build/junaruga/ruby/spec/mspec/lib/mspec/matchers/block_caller.rb:3 run> terminated with exception (report_on_exception is true):
/home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `ip_address_list': stack level too deep (SystemStackError)
  from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `udp_server_sockets'
  from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:1027:in `udp_server_loop'
  from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:7:in `block (4 levels) in <top (required)>'
  from /home/travis/build/junaruga/ruby/spec/mspec/lib/mspec/matchers/block_caller.rb:4:in `block in matches?'

#<Thread:0x000009432a971a00@/home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:24 run> terminated with exception (report_on_exception is true):
/home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `ip_address_list': stack level too deep (SystemStackError)
  from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:897:in `udp_server_sockets'
  from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/fixtures/classes.rb:149:in `udp_server_sockets'
  from /home/travis/build/junaruga/ruby/build/.ext/common/socket.rb:1027:in `udp_server_loop'
  from /home/travis/build/junaruga/ruby/spec/ruby/library/socket/socket/udp_server_loop_spec.rb:25:in `block (4 levels) in <top (required)>'

getsockname's pid=-1334654655 looks weird at https://travis-ci.org/junaruga/ruby/jobs/621767236#L16525

$ strace -f $SETARCH make -s test-spec MSPECOPT=-ff SPECOPTS="../spec/ruby/library/socket/socket/udp_server_loop_spec.rb"
...
[pid 17414] <... getsockname resumed> {sa_family=AF_NETLINK, pid=-1334654655, groups=00000000}, [12]) = 0
[pid 17413] <... read resumed> 0x7ffffec81540, 8) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17414] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fb4e1ded7c0} ---
..

Could you help it?
Thank you.

Updated by mame (Yusuke Endoh) 7 months ago

Hi jaruga (Jun Aruga) ReiOdaira (Rei Odaira)

Unfortunately, s390x Travis CI fails too frequently.

https://travis-ci.org/ruby/ruby/jobs/623537864
https://travis-ci.org/ruby/ruby/jobs/623487659
https://travis-ci.org/ruby/ruby/jobs/623482529
https://travis-ci.org/ruby/ruby/jobs/623480228
https://travis-ci.org/ruby/ruby/jobs/623480013

$ tool/travis_retry.sh sudo -E apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
+ sudo -E apt-add-repository -y ppa:ubuntu-toolchain-r/test
Error: retrieving gpg key timed out.

I think it is not your fault but maybe a network setting issue or something in Travis CI. But anyway, frequent failures make it difficult to keep trunk sound. Could you contact on Travis CI and fix the issue?

Updated by jaruga (Jun Aruga) 7 months ago

Hi jaruga (Jun Aruga) ReiOdaira (Rei Odaira)
Unfortunately, s390x Travis CI fails too frequently.
...
I think it is not your fault but maybe a network setting issue or something in Travis CI.
But anyway, frequent failures make it difficult to keep trunk sound. Could you contact on Travis CI and fix the issue?

Hi mame (Yusuke Endoh).
Sure, I will take a look at the code, contact if it's needed and fix.

I think you can "s390x-linux" case to allow_failures in Travis at the moment as a temporary workflow. Then after making sure that the case is stable, we can remove it from allow_failures.

#10

Updated by mame (Yusuke Endoh) 7 months ago

  • Status changed from Open to Closed

Applied in changeset git|47a365dd580f2dfe0f0d56155587dfdf2fc7afb7.


Move s390x-linux to allow_failures matrix

ref [Misc #16360]

Updated by mame (Yusuke Endoh) 7 months ago

  • Status changed from Closed to Open

Okay, moved to allow_failures.

Updated by jaruga (Jun Aruga) 7 months ago

You see that "s390x-linux" case was disappeared on the latest master branch Travis now.
https://travis-ci.org/ruby/ruby/builds/623632919

I think it was a wrong way.

The modification to add s390x-linux case to allow_failures could be like this from the latest master branch's .travis.yml.

$ git diff
diff --git a/.travis.yml b/.travis.yml
index 5d1a8822cf..71945e349e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -418,6 +418,7 @@ matrix:
     - <<: *arm64-linux
     - <<: *i686-linux
     - <<: *arm32-linux
+    - <<: *s390x-linux
     - <<: *pedanticism
     - <<: *assertions
     - <<: *baseruby
@@ -438,10 +439,10 @@ matrix:
     - <<: *CALL_THREADED_CODE
     - <<: *NO_THREADED_CODE
   allow_failures:
+    - name: s390x-linux
     - name: -fsanitize=address
     - name: -fsanitize=memory
     - name: -fsanitize=undefined
-    - <<: *s390x-linux
   fast_finish: true

 before_script:
#13

Updated by mame (Yusuke Endoh) 7 months ago

  • Status changed from Open to Closed

Applied in changeset git|cae657c32492a9b4e72b5e290c143e2c84d4c42d.


Fix .travis.yml to keep s390x-linux as allow_features

[Misc #16360]

#14

Updated by mame (Yusuke Endoh) 7 months ago

  • Status changed from Closed to Open

Updated by mame (Yusuke Endoh) 7 months ago

Oops sorry! Applied your patch, thanks!

Updated by jaruga (Jun Aruga) 6 months ago

Sure, I will take a look at the code, contact if it's needed and fix.

Hi, possibly I found the cause of this error for apt-add-repository command that happens for only Travis s390x environment, and how to avoid it.

$ tool/travis_retry.sh sudo -E apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
+ sudo -E apt-add-repository -y ppa:ubuntu-toolchain-r/test
Error: retrieving gpg key timed out.

The cause is by the Travis s390x environment's IPv6 issue.

I reported it to Travis community here.

S390x IPv6 connect system call sometimes returning “Connection timed out” in apt-add-repository command
https://travis-ci.community/t/6719

I checked apt-add-repository command's behavior with strace command.

$ sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
...
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8003", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3)    
...
pid  1948] futex(0x3ff88000c14, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {1578070812, 434829000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1948] write(1, "Error: retrieving gpg key timed "..., 37Error: retrieving gpg key timed out.
) = 37
...

The temporary workflow to pass s390x-linux case is to disable IPv6 for only s390x-linux case like this.
I executed the simple example's s390x case job 10 times continuously, and all job is passed.

- ip a
- sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
- sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
- sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1
- ip a

Later I will send the pull-request to disable IPv6 on only Travis s390x-linux case.

Updated by jaruga (Jun Aruga) 6 months ago

Later I will send the pull-request to disable IPv6 on only Travis s390x-linux case.

I sent the pull-request to fix the s390x issue now.

Disable IPv6 on Travis s390x case.
https://github.com/ruby/ruby/pull/2819

Updated by jaruga (Jun Aruga) 5 months ago

https://bugs.ruby-lang.org/issues/16360#note-7

I am still debugging for the "stack level too deep (SystemStackError)" issue that happens on only ppc64le.

When I was debugging with following file,

$ cat spec/ruby/library/socket/socket/udp_server_loop_debug3_spec.rb
require_relative '../../../spec_helper'
require 'socket'

describe 'Socket.udp_server_loop debug' do
  it 'blocks the caller' do
    # socket_block = -> do
    #   Socket.ip_address_list
    # end

    socket_block = proc { Socket.ip_address_list }

    # This line is okay.
    puts "[DEBUG] 1 Socket.ip_address_list"
    p Socket.ip_address_list

    # This line is error.
    # See spec/mspec/lib/mspec/matchers/block_caller.rb
    puts "[DEBUG] 2 Socket.ip_address_list with mspec block_caller"
    socket_block.should_not block_caller
  end
end

running like this.

.travis.yml

- $SETARCH make test-spec MSPECOPT="-ff -V" SPECOPTS="../spec/ruby/library/socket/socket/udp_server_loop_debug3_spec.rb"

The "[DEBUG] 1" p Socket.ip_address_list is okay.
But "[DEBUG] 2" when proc { Socket.ip_address_list } is used with mspec block_caller, it's not okay. The result is like this.
https://travis-ci.org/junaruga/ruby/jobs/646473964#L2792

Here is the strace's log.
It might be related to the socket with network device and Thread.

[pid 17528] getsockname(5, {sa_family=AF_NETLINK, pid=17527, groups=00000000}, [12]) = 0
[pid 17528] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7d7fd6d8d620} ---

Rei Odaira, is it possible to run make test-spec SPECOPTS="spec/ruby/library/socket/socket/udp_server_loop_spec.rb" on your ppc64le environment? Does it work on your environment?
Thanks.

Updated by jaruga (Jun Aruga) 4 months ago

S390x IPv6 connect system call sometimes returning “Connection timed out” in apt-add-repository command
https://travis-ci.community/t/6719

For Travis s390x IPv6 issue, I was told the issue was fixed by Travis.
https://travis-ci.community/t/6719/7

So, I sent the PR to enable it again here.
I tested it several times on my forked repository, and it was okay.
https://github.com/ruby/ruby/pull/2970

Also available in: Atom PDF