Project

General

Profile

Actions

Bug #20085

closed

Fiber.new{ }.resume causes Segmentation fault for Ruby 3.3.0 on aarch64-linux

Added by oleksii (Oleksii Leonov) 4 months ago. Updated about 2 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [aarch64-linux]
[ruby-core:115892]

Description

ruby -e "Fiber.new{}.resume"

0.170 -e:1: [BUG] Segmentation fault at 0x0036ffffb4f110f0
0.170 ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [aarch64-linux]
0.170 
0.170 -- Control frame information -----------------------------------------------
0.170 c:0003 p:---- s:0010 e:000009 CFUNC  :resume
0.170 c:0002 p:0007 s:0006 E:0005e0 EVAL   -e:1 [FINISH]
0.170 c:0001 p:0000 s:0003 E:000b50 DUMMY  [FINISH]
0.170 
0.170 -- Ruby level backtrace information ----------------------------------------
0.170 -e:1:in `<main>'
0.170 -e:1:in `resume'
0.170 
0.170 -- Threading information ---------------------------------------------------
0.170 Total ractor count: 1
0.170 Ruby thread count for this ractor: 1
0.170 
0.170 -- Machine register context ------------------------------------------------
0.170   x0: 0x0000aaab11b4f570  x1: 0x0000aaab11de4a40  x2: 0x0000ffffe5571990
0.170   x3: 0x0000ffff9ac5ff60  x4: 0x0000ffff9ac60018  x5: 0x0000ffff9ac80000
0.170   x6: 0x0000ffffb5458b88  x7: 0x0000000000000000 x18: 0x00000000007fffff
0.170  x19: 0x0000000000000000 x20: 0x0000000000000000 x21: 0x0000000000000000
0.170  x22: 0x0000000000000000 x23: 0x0000000000000000 x24: 0x0000000000000000
0.170  x25: 0x0000000000000000 x26: 0x0000000000000000 x27: 0x0000000000000000
0.170  x28: 0x0000000000000000 x29: 0x0000000000000000  sp: 0x0000ffff9ac60000
0.170  fau: 0x0036ffffb4f110f0
0.170 
0.170 -- C level backtrace information -------------------------------------------
0.171 Segmentation fault
  • Ruby 3.3.0-rc1 and 3.2.2 work without Segmentation fault on both amd64 and arm64.
  • Ruby 3.3.0 works on amd64, but fails with Segmentation fault on arm64 (aarch64-linux) on Ubuntu 22.04 and Debian Bookworm.

Dockerfile to reproduce:

FROM debian:bookworm

# Set UTF-8 locale by default.
ENV RBENV_ROOT=/usr/local/rbenv \
    PATH=/usr/local/rbenv/bin:/usr/local/rbenv/shims:$PATH

RUN ( \
  export DEBIAN_FRONTEND=noninteractive \
  && apt-get update \
  && apt-get install -y --no-install-recommends \
        ca-certificates curl git \
        # standard dependencies for rbenv
        autoconf \
        patch \
        build-essential \
        rustc \
        libssl-dev \
        libyaml-dev \
        libreadline6-dev \
        zlib1g-dev \
        libgmp-dev \
        libncurses5-dev \
        libffi-dev \
        libgdbm6 \
        libgdbm-dev \
        libdb-dev \
        uuid-dev \
  )

ENV RBENV_VERSION=3.3.0

RUN ( \
  export DEBIAN_FRONTEND=noninteractive \
  # Install rbenv & ruby-build
  && git clone https://github.com/rbenv/rbenv.git /usr/local/rbenv \
  && git clone https://github.com/rbenv/ruby-build.git /usr/local/rbenv/plugins/ruby-build \
  && /usr/local/rbenv/plugins/ruby-build/install.sh \
  && echo 'export RBENV_ROOT=/usr/local/rbenv' >> /etc/profile.d/rbenv.sh \
  && echo 'export PATH=/usr/local/rbenv/bin:$PATH' >> /etc/profile.d/rbenv.sh \
  && echo 'eval "$(rbenv init -)"' >> /etc/profile.d/rbenv.sh \
  && echo 'export RBENV_ROOT=/usr/local/rbenv' >> /root/.bashrc \
  && echo 'export PATH=/usr/local/rbenv/bin:$PATH' >> /root/.bashrc \
  && echo 'eval "$(rbenv init -)"' >> /root/.bashrc \
  && eval "$(rbenv init -)"; rbenv install $RBENV_VERSION \
  && eval "$(rbenv init -)"; rbenv global $RBENV_VERSION \
  )

RUN 

RUN ruby -e "Fiber.new{}.resume"

docker build --platform linux/arm64 .


Files

Dockerfile (1.53 KB) Dockerfile Dockerfile to reproduce (docker build --platform linux/arm64 .) oleksii (Oleksii Leonov), 12/25/2023 03:41 PM

Related issues 2 (0 open2 closed)

Has duplicate Ruby master - Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 MacFeedbackActions
Has duplicate Ruby master - Bug #20268: Segfault in ruby 3.3 Fiber on aarch64 musl (mac m1)ClosedActions

Updated by mame (Yusuke Endoh) 4 months ago

Thanks. I tried the Dockerfile you provided, but I couldn't reproduce the segfault on aarch64.

Can you reproduce the issue with gdb and provide the backtrace?

Add the following lines to Dockerfile

  • RUN apt-get install gdb
  • CMD gdb --args ruby -e "Fiber.new{}.resume"

Then do docker run the image, and if gdb catches a segfault, type backtrace command to the gdb prompt.

Updated by katei (Yuta Saito) 4 months ago

  • Assignee set to katei (Yuta Saito)

I'm sure https://github.com/ruby/ruby/pull/9306 is the culprit but not sure when the segfault happens.
Could you share your /proc/cpuinfo to see if your machine supports Pointer Authentication or Branch Target Identification?

Updated by tomog105 (Tomohiro Ogoke) 4 months ago

In my environment (Apple M1 MacBook on macOS 13.6.3), I could reproduce if I used the Apple Virtualization framework (VZ) machine type in a container execution environment (Rancher Desktop), but I couldn't reproduce if I used the QEMU emulators machine type.
I shared /proc/cpuinfo of both machines.

VZ machine type (could reproduce the segfault)

$ cat /proc/cpuinfo
processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer	: 0x00
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

processor	: 1
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer	: 0x00
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

QEMU machine type (couldn't reproduce the segfault)

ssbs feature is missing

$ cat /proc/cpuinfo
processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer	: 0x00
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

processor	: 1
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer	: 0x00
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

Updated by tomog105 (Tomohiro Ogoke) 4 months ago

Additionally, my another machine (Apple M2 Pro MacBook) could reproduce the segfault on both machine type of QEMU and VZ in Rancher Desktop.
Therefore, a difference of the cpu feature may not be trigger for the segfault.

And, I put the gdb logs when raised the segfault.

GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffddf5f100 (LWP 21)]

Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x004ffffff79e06f8 in ?? ()
(gdb) backtrace
#0  0x004ffffff79e06f8 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further

Updated by katei (Yuta Saito) 4 months ago

Okay, I could reproduce the issue on my end. Thank you for your detailed info!

The root issue is that CFLAGS uses -mbranch-protection=pac-ret guessed by configure but ASFLAGS doesn't.

So here is a workaround to address the issue without patching the ruby source.

./configure ASFLAGS=-mbranch-protection=pac-ret

I'll fix it in configure.ac also until 3.3.1

Updated by oleksii (Oleksii Leonov) 4 months ago

@tomog105 (Tomohiro Ogoke), thank you a lot!

I want to confirm that the segfault happens inside Docker Desktop's VM.

On my machine (MacBook Pro M1 Max, Mac OS 14.2.1, Docker Desktop 4.26.1), it happens both for "Use Virtualization framework" option turned on and off.

With the enabled "Use Virtualization framework" option (Apple VZ is used instead of QEMU, so ssbs extension is present):

$ uname -a
Linux b8c3da04fc58 6.5.11-linuxkit #1 SMP PREEMPT Wed Dec  6 17:08:31 UTC 2023 aarch64 GNU/Linux

$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x000
CPU revision    : 0
# ...

$ gdb --args /usr/local/rbenv/versions/3.3.0/bin/ruby -e "Fiber.new{}.resume"
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffde03f100 (LWP 8987)]

Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x002cfffff7ac10f0 in ?? ()
(gdb) backtrace
#0  0x002cfffff7ac10f0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further

With the disabled "Use Virtualization framework" option (QEMU is used instead of Apple VZ):

$ uname -a
Linux 55913114f674 6.5.11-linuxkit #1 SMP PREEMPT Wed Dec  6 17:08:31 UTC 2023 aarch64 GNU/Linux

$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x000
CPU revision    : 0
# ...

$ gdb --args /usr/local/rbenv/versions/3.3.0/bin/ruby -e "Fiber.new{}.resume"
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffde03f100 (LWP 3157)]

Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x000bfffff7ac10f0 in ?? ()
(gdb) backtrace
#0  0x000bfffff7ac10f0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further

Updated by ioquatix (Samuel Williams) 4 months ago

I think we should backport this fix urgently.

Updated by tomog105 (Tomohiro Ogoke) 4 months ago

This issue is fixed by https://github.com/ruby/ruby/pull/9371 in my environment. Thank you @katei (Yuta Saito) san!

Updated by ioquatix (Samuel Williams) 4 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: REQUIRED

I strongly advise this is backported urgently.

Updated by katei (Yuta Saito) 4 months ago

Opened a backport PR https://github.com/ruby/ruby/pull/9385

@naruse (Yui NARUSE) Can we backport the patch to 3.3 branch?

  • Short description: Fiber on aarch64 with PAC support on Linux was broken in 3.3.0. The patch fixes the issue by adjusting compiler options.
  • Scope: Only aarch64 with PAC support on Linux (like aarch64 Linux container on M1 Mac)
  • Testing: Unfortunately, we don't have a CI node that affects the issue. The community and I tested the patch manually on such environments.

Updated by naruse (Yui NARUSE) 4 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

@katei (Yuta Saito) Since Backport field is correctly specified, I'll merge it for 3.3.1 after the fix is merged into master.

Updated by katei (Yuta Saito) 4 months ago

@naruse (Yui NARUSE) For the record, the fix is already merged into the master by https://github.com/ruby/ruby/pull/9371

Actions #14

Updated by jeremyevans0 (Jeremy Evans) 4 months ago

  • Status changed from Open to Closed

Updated by navels (Lee Nave) 3 months ago

Any guess as to when the backport to 3.3.0 will land? Our 3.3.0 upgrade is on hold, wondering if we should wait or employ a workaround.

Updated by londonappdev (Mark Winterbottom) 2 months ago

Noticed this is closed but the issue still persists and the backport isn't merged in yet. Do we know when it will be available?

Updated by navels (Lee Nave) 2 months ago

Getting closer...backport merged to 3.3.0. Anyone familiar with the release process know when the new binaries will land?

Updated by dorianmariefr (Dorian Marié) 2 months ago

I can't deploy apps with Kamal (e.g. Docker) because of this crash

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 2 months ago

To everyone saying they’re blocked by this bug: how are you installing your Ruby? The workaround in https://bugs.ruby-lang.org/issues/20085#note-5 should be pretty straightforward and get you unblocked.

Of course we need to fix the configure script so that it works by default but this isn’t that difficult to fix in your own builds for now, it shouldn’t be holding up anybody’s Ruby 3.3 upgrade.

Updated by hsbt (Hiroshi SHIBATA) 2 months ago

  • Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE

Updated by navels (Lee Nave) 2 months ago

Replying to kjtsanaktsidis, I tried building our docker image using ruby-install as suggested here:

https://github.com/ruby/ruby/pull/9371#issuecomment-1893851123

and that caused problems with permissions, requiring sudo for installing gems, so I tried doing a local (instead of system) install with ruby-install and that created another set of problems (again, mostly around permissions) and for us it just wasn't worth the hassle of continuing to put band-aids on our docker setup, I'd rather just wait for 3.3.0 to update (yes I see that the backport was merged, what I want are official binaries) and at this point waiting for 3.3.1 would be fine, too.

Actions #22

Updated by hsbt (Hiroshi SHIBATA) 2 months ago

  • Related to Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac added
Actions #23

Updated by sorah (Sorah Fukumori) 2 months ago

  • Related to deleted (Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac)
Actions #24

Updated by sorah (Sorah Fukumori) 2 months ago

  • Has duplicate Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac added

Updated by londonappdev (Mark Winterbottom) 2 months ago

hsbt (Hiroshi SHIBATA) wrote in #note-20:

https://github.com/ruby/ruby/pull/9371 is already backported into ruby_3_3.

https://github.com/ruby/ruby/commit/7f97e3540ce448b501bcbee15afac5f94bb22dd9

I'm still blocked on this because setup uses the ruby:3.3.0 which still appears to have the issue.

Actions #26

Updated by alanwu (Alan Wu) about 2 months ago

  • Has duplicate Bug #20268: Segfault in ruby 3.3 Fiber on aarch64 musl (mac m1) added

Updated by osyoyu (Daisuke Aritomo) about 2 months ago

I'm heavily affected by this bug. Setup is Docker Desktop on Mac using the ruby:3.3 image.

I've submitted a pull request to the ruby image to apply the backport fix to 3.3.0 https://github.com/docker-library/ruby/pull/439 . Still, I'd hope 3.3.1 to be released soon.

Updated by navels (Lee Nave) about 2 months ago · Edited

^ that docker fix unfortunately does not work for the Debian Bookworm image.

Is there anyone reading this that can turn whatever crank will get the 3.3.0 backport to materialize into a binary release? Based on https://github.com/ruby/dev-meeting-log/blob/master/2024/DevMeeting-2024-02-14.md#about-release-timeframe I gather it will be another month before 3.3.1 is released (tbh I don't know if I am reading the tea leaves correctly).

Updated by tianon (Tianon Gravi) about 2 months ago

We applied the patch from https://github.com/ruby/ruby/commit/7f97e3540ce448b501bcbee15afac5f94bb22dd9.patch?full_index=1 directly in our Dockerfile for the ruby:3.3-* images we maintain in https://github.com/docker-library/ruby/pull/439, and it appears to have worked on our Debian Bullseye based images, but does not appear to be fixing the bug on Debian Bookworm. Is it possible that the patch missed some edge case?

(The point being that it doesn't seem like the backport as-is is quite enough by itself to fix the issue, so just doing a release won't magically "fix" it in our image as we're applying the officially backported patch directly in our builds now anyhow.)

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 months ago

That patch just modifies the configure.ac file, which is the autoconf macros which generate the ./configure script. If you patch this file, you need to regenerate the configure script by calling ./autogen.sh. (Apologies if you're already doing this somewhere, but I couldn't find it in your PR).

Note you will need autoconf installed to be able to run ./autogen.sh as well.

Updated by tianon (Tianon Gravi) about 2 months ago

kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-30:

That patch just modifies the configure.ac file, which is the autoconf macros which generate the ./configure script. If you patch this file, you need to regenerate the configure script by calling ./autogen.sh. (Apologies if you're already doing this somewhere, but I couldn't find it in your PR).

Note you will need autoconf installed to be able to run ./autogen.sh as well.

Ahhhh, thank you!! We do run autoconf, but before the block where we apply the patch. 🤦

Actions

Also available in: Atom PDF

Like4
Like0Like0Like2Like1Like2Like0Like3Like3Like1Like2Like1Like1Like2Like0Like2Like1Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0