Bug #20085
closedFiber.new{ }.resume causes Segmentation fault for Ruby 3.3.0 on aarch64-linux
Description
ruby -e "Fiber.new{}.resume"
0.170 -e:1: [BUG] Segmentation fault at 0x0036ffffb4f110f0
0.170 ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [aarch64-linux]
0.170
0.170 -- Control frame information -----------------------------------------------
0.170 c:0003 p:---- s:0010 e:000009 CFUNC :resume
0.170 c:0002 p:0007 s:0006 E:0005e0 EVAL -e:1 [FINISH]
0.170 c:0001 p:0000 s:0003 E:000b50 DUMMY [FINISH]
0.170
0.170 -- Ruby level backtrace information ----------------------------------------
0.170 -e:1:in `<main>'
0.170 -e:1:in `resume'
0.170
0.170 -- Threading information ---------------------------------------------------
0.170 Total ractor count: 1
0.170 Ruby thread count for this ractor: 1
0.170
0.170 -- Machine register context ------------------------------------------------
0.170 x0: 0x0000aaab11b4f570 x1: 0x0000aaab11de4a40 x2: 0x0000ffffe5571990
0.170 x3: 0x0000ffff9ac5ff60 x4: 0x0000ffff9ac60018 x5: 0x0000ffff9ac80000
0.170 x6: 0x0000ffffb5458b88 x7: 0x0000000000000000 x18: 0x00000000007fffff
0.170 x19: 0x0000000000000000 x20: 0x0000000000000000 x21: 0x0000000000000000
0.170 x22: 0x0000000000000000 x23: 0x0000000000000000 x24: 0x0000000000000000
0.170 x25: 0x0000000000000000 x26: 0x0000000000000000 x27: 0x0000000000000000
0.170 x28: 0x0000000000000000 x29: 0x0000000000000000 sp: 0x0000ffff9ac60000
0.170 fau: 0x0036ffffb4f110f0
0.170
0.170 -- C level backtrace information -------------------------------------------
0.171 Segmentation fault
- Ruby 3.3.0-rc1 and 3.2.2 work without Segmentation fault on both amd64 and arm64.
- Ruby 3.3.0 works on amd64, but fails with Segmentation fault on arm64 (aarch64-linux) on Ubuntu 22.04 and Debian Bookworm.
Dockerfile to reproduce:
FROM debian:bookworm
# Set UTF-8 locale by default.
ENV RBENV_ROOT=/usr/local/rbenv \
PATH=/usr/local/rbenv/bin:/usr/local/rbenv/shims:$PATH
RUN ( \
export DEBIAN_FRONTEND=noninteractive \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates curl git \
# standard dependencies for rbenv
autoconf \
patch \
build-essential \
rustc \
libssl-dev \
libyaml-dev \
libreadline6-dev \
zlib1g-dev \
libgmp-dev \
libncurses5-dev \
libffi-dev \
libgdbm6 \
libgdbm-dev \
libdb-dev \
uuid-dev \
)
ENV RBENV_VERSION=3.3.0
RUN ( \
export DEBIAN_FRONTEND=noninteractive \
# Install rbenv & ruby-build
&& git clone https://github.com/rbenv/rbenv.git /usr/local/rbenv \
&& git clone https://github.com/rbenv/ruby-build.git /usr/local/rbenv/plugins/ruby-build \
&& /usr/local/rbenv/plugins/ruby-build/install.sh \
&& echo 'export RBENV_ROOT=/usr/local/rbenv' >> /etc/profile.d/rbenv.sh \
&& echo 'export PATH=/usr/local/rbenv/bin:$PATH' >> /etc/profile.d/rbenv.sh \
&& echo 'eval "$(rbenv init -)"' >> /etc/profile.d/rbenv.sh \
&& echo 'export RBENV_ROOT=/usr/local/rbenv' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/rbenv/bin:$PATH' >> /root/.bashrc \
&& echo 'eval "$(rbenv init -)"' >> /root/.bashrc \
&& eval "$(rbenv init -)"; rbenv install $RBENV_VERSION \
&& eval "$(rbenv init -)"; rbenv global $RBENV_VERSION \
)
RUN
RUN ruby -e "Fiber.new{}.resume"
docker build --platform linux/arm64 .
Files
Updated by mame (Yusuke Endoh) 9 months ago
Thanks. I tried the Dockerfile you provided, but I couldn't reproduce the segfault on aarch64.
Can you reproduce the issue with gdb and provide the backtrace?
Add the following lines to Dockerfile
RUN apt-get install gdb
CMD gdb --args ruby -e "Fiber.new{}.resume"
Then do docker run
the image, and if gdb catches a segfault, type backtrace
command to the gdb prompt.
Updated by katei (Yuta Saito) 9 months ago
- Assignee set to katei (Yuta Saito)
I'm sure https://github.com/ruby/ruby/pull/9306 is the culprit but not sure when the segfault happens.
Could you share your /proc/cpuinfo
to see if your machine supports Pointer Authentication or Branch Target Identification?
Updated by tomog105 (Tomohiro Ogoke) 9 months ago
In my environment (Apple M1 MacBook on macOS 13.6.3), I could reproduce if I used the Apple Virtualization framework (VZ) machine type in a container execution environment (Rancher Desktop), but I couldn't reproduce if I used the QEMU emulators machine type.
I shared /proc/cpuinfo
of both machines.
VZ machine type (could reproduce the segfault)¶
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x00
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
processor : 1
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x00
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
QEMU machine type (couldn't reproduce the segfault)¶
ssbs
feature is missing
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x00
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
processor : 1
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x00
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
Updated by tomog105 (Tomohiro Ogoke) 9 months ago
Additionally, my another machine (Apple M2 Pro MacBook) could reproduce the segfault on both machine type of QEMU and VZ in Rancher Desktop.
Therefore, a difference of the cpu feature may not be trigger for the segfault.
And, I put the gdb logs when raised the segfault.
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffddf5f100 (LWP 21)]
Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x004ffffff79e06f8 in ?? ()
(gdb) backtrace
#0 0x004ffffff79e06f8 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
Updated by katei (Yuta Saito) 9 months ago
Okay, I could reproduce the issue on my end. Thank you for your detailed info!
The root issue is that CFLAGS
uses -mbranch-protection=pac-ret
guessed by configure
but ASFLAGS
doesn't.
So here is a workaround to address the issue without patching the ruby source.
./configure ASFLAGS=-mbranch-protection=pac-ret
I'll fix it in configure.ac
also until 3.3.1
Updated by oleksii (Oleksii Leonov) 9 months ago
@tomog105 (Tomohiro Ogoke), thank you a lot!
I want to confirm that the segfault happens inside Docker Desktop's VM.
On my machine (MacBook Pro M1 Max, Mac OS 14.2.1, Docker Desktop 4.26.1), it happens both for "Use Virtualization framework" option turned on and off.
With the enabled "Use Virtualization framework" option (Apple VZ is used instead of QEMU, so ssbs
extension is present):
$ uname -a
Linux b8c3da04fc58 6.5.11-linuxkit #1 SMP PREEMPT Wed Dec 6 17:08:31 UTC 2023 aarch64 GNU/Linux
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
# ...
$ gdb --args /usr/local/rbenv/versions/3.3.0/bin/ruby -e "Fiber.new{}.resume"
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffde03f100 (LWP 8987)]
Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x002cfffff7ac10f0 in ?? ()
(gdb) backtrace
#0 0x002cfffff7ac10f0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
With the disabled "Use Virtualization framework" option (QEMU is used instead of Apple VZ):
$ uname -a
Linux 55913114f674 6.5.11-linuxkit #1 SMP PREEMPT Wed Dec 6 17:08:31 UTC 2023 aarch64 GNU/Linux
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
# ...
$ gdb --args /usr/local/rbenv/versions/3.3.0/bin/ruby -e "Fiber.new{}.resume"
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/rbenv/versions/3.3.0/bin/ruby...
(gdb) run
Starting program: /usr/local/rbenv/versions/3.3.0/bin/ruby -e Fiber.new\{\}.resume
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffde03f100 (LWP 3157)]
Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x000bfffff7ac10f0 in ?? ()
(gdb) backtrace
#0 0x000bfffff7ac10f0 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
Updated by ioquatix (Samuel Williams) 9 months ago
I think we should backport this fix urgently.
Updated by katei (Yuta Saito) 9 months ago
I think https://github.com/ruby/ruby/pull/9371 will fix this issue
Updated by tomog105 (Tomohiro Ogoke) 9 months ago
This issue is fixed by https://github.com/ruby/ruby/pull/9371 in my environment. Thank you @katei (Yuta Saito) san!
Updated by ioquatix (Samuel Williams) 9 months ago
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: REQUIRED
I strongly advise this is backported urgently.
Updated by katei (Yuta Saito) 8 months ago
Opened a backport PR https://github.com/ruby/ruby/pull/9385
@naruse (Yui NARUSE) Can we backport the patch to 3.3 branch?
- Short description:
Fiber
on aarch64 with PAC support on Linux was broken in 3.3.0. The patch fixes the issue by adjusting compiler options. - Scope: Only aarch64 with PAC support on Linux (like aarch64 Linux container on M1 Mac)
- Testing: Unfortunately, we don't have a CI node that affects the issue. The community and I tested the patch manually on such environments.
Updated by naruse (Yui NARUSE) 8 months ago
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
@katei (Yuta Saito) Since Backport field is correctly specified, I'll merge it for 3.3.1 after the fix is merged into master.
Updated by katei (Yuta Saito) 8 months ago
@naruse (Yui NARUSE) For the record, the fix is already merged into the master by https://github.com/ruby/ruby/pull/9371
Updated by jeremyevans0 (Jeremy Evans) 8 months ago
- Status changed from Open to Closed
Updated by navels (Lee Nave) 8 months ago
Any guess as to when the backport to 3.3.0 will land? Our 3.3.0 upgrade is on hold, wondering if we should wait or employ a workaround.
Updated by londonappdev (Mark Winterbottom) 7 months ago
Noticed this is closed but the issue still persists and the backport isn't merged in yet. Do we know when it will be available?
Updated by navels (Lee Nave) 7 months ago
Getting closer...backport merged to 3.3.0. Anyone familiar with the release process know when the new binaries will land?
Updated by dorianmariefr (Dorian Marié) 7 months ago
I can't deploy apps with Kamal (e.g. Docker) because of this crash
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 7 months ago
To everyone saying they’re blocked by this bug: how are you installing your Ruby? The workaround in https://bugs.ruby-lang.org/issues/20085#note-5 should be pretty straightforward and get you unblocked.
Of course we need to fix the configure script so that it works by default but this isn’t that difficult to fix in your own builds for now, it shouldn’t be holding up anybody’s Ruby 3.3 upgrade.
Updated by hsbt (Hiroshi SHIBATA) 7 months ago
- Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE
https://github.com/ruby/ruby/pull/9371 is already backported into ruby_3_3
.
https://github.com/ruby/ruby/commit/7f97e3540ce448b501bcbee15afac5f94bb22dd9
Updated by navels (Lee Nave) 7 months ago
Replying to kjtsanaktsidis, I tried building our docker image using ruby-install as suggested here:
https://github.com/ruby/ruby/pull/9371#issuecomment-1893851123
and that caused problems with permissions, requiring sudo for installing gems, so I tried doing a local (instead of system) install with ruby-install and that created another set of problems (again, mostly around permissions) and for us it just wasn't worth the hassle of continuing to put band-aids on our docker setup, I'd rather just wait for 3.3.0 to update (yes I see that the backport was merged, what I want are official binaries) and at this point waiting for 3.3.1 would be fine, too.
Updated by hsbt (Hiroshi SHIBATA) 7 months ago
- Related to Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac added
Updated by sorah (Sorah Fukumori) 7 months ago
- Related to deleted (Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac)
Updated by sorah (Sorah Fukumori) 7 months ago
- Has duplicate Bug #20264: Segfault in Enumerator#next while installing RMagick on M1 Mac added
Updated by londonappdev (Mark Winterbottom) 7 months ago
hsbt (Hiroshi SHIBATA) wrote in #note-20:
https://github.com/ruby/ruby/pull/9371 is already backported into
ruby_3_3
.https://github.com/ruby/ruby/commit/7f97e3540ce448b501bcbee15afac5f94bb22dd9
I'm still blocked on this because setup uses the ruby:3.3.0
which still appears to have the issue.
Updated by alanwu (Alan Wu) 7 months ago
- Has duplicate Bug #20268: Segfault in ruby 3.3 Fiber on aarch64 musl (mac m1) added
Updated by osyoyu (Daisuke Aritomo) 7 months ago
I'm heavily affected by this bug. Setup is Docker Desktop on Mac using the ruby:3.3
image.
I've submitted a pull request to the ruby
image to apply the backport fix to 3.3.0 https://github.com/docker-library/ruby/pull/439 . Still, I'd hope 3.3.1 to be released soon.
Updated by navels (Lee Nave) 7 months ago · Edited
^ that docker fix unfortunately does not work for the Debian Bookworm image.
Is there anyone reading this that can turn whatever crank will get the 3.3.0 backport to materialize into a binary release? Based on https://github.com/ruby/dev-meeting-log/blob/master/2024/DevMeeting-2024-02-14.md#about-release-timeframe I gather it will be another month before 3.3.1 is released (tbh I don't know if I am reading the tea leaves correctly).
Updated by tianon (Tianon Gravi) 6 months ago
We applied the patch from https://github.com/ruby/ruby/commit/7f97e3540ce448b501bcbee15afac5f94bb22dd9.patch?full_index=1 directly in our Dockerfile
for the ruby:3.3-*
images we maintain in https://github.com/docker-library/ruby/pull/439, and it appears to have worked on our Debian Bullseye based images, but does not appear to be fixing the bug on Debian Bookworm. Is it possible that the patch missed some edge case?
(The point being that it doesn't seem like the backport as-is is quite enough by itself to fix the issue, so just doing a release won't magically "fix" it in our image as we're applying the officially backported patch directly in our builds now anyhow.)
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 6 months ago
That patch just modifies the configure.ac
file, which is the autoconf macros which generate the ./configure
script. If you patch this file, you need to regenerate the configure script by calling ./autogen.sh
. (Apologies if you're already doing this somewhere, but I couldn't find it in your PR).
Note you will need autoconf
installed to be able to run ./autogen.sh
as well.
Updated by tianon (Tianon Gravi) 6 months ago
kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-30:
That patch just modifies the
configure.ac
file, which is the autoconf macros which generate the./configure
script. If you patch this file, you need to regenerate the configure script by calling./autogen.sh
. (Apologies if you're already doing this somewhere, but I couldn't find it in your PR).Note you will need
autoconf
installed to be able to run./autogen.sh
as well.
Ahhhh, thank you!! We do run autoconf
, but before the block where we apply the patch. 🤦
Updated by hsbt (Hiroshi SHIBATA) 4 months ago
- Has duplicate Bug #20426: crash on aarch64 linux when using fibers (regression with 3.3) added