Bug #18286
openUniversal arm64/x86_84 binary built on an x86_64 machine segfaults/is killed on arm64
Description
A universal arm64/x86_84 ruby binary for macOS built on a x86_64 machine segfaults/is killed when executed on an arm64 machine.
To reproduce:
- On an Intel Mac:
git clone https://github.com/ruby/ruby && cd ruby && git checkout v3_0_2 && ./autogen.sh && ./configure --with-arch=arm64,x86_64 && make -j$(sysctl -n hw.ncpu)
- Copy the built
./ruby
binary to an Apple Silicon machine - Attempt to execute it
Expected:
The universal ruby
binary works correctly on both devices
Actual:
The universal ruby
binary crashes with either Segmentation fault: 11
or Killed: 9
(this seems to occur if arm64e
is used instead of arm64
).
Details:
I'm attempting to build a universal Ruby for macOS that will run on both Intel (x86_64) and Apple Silicon (arm64) machines.
It seemed initially that this was as easy as adding --with-arch=arm64,x86_64
to ./configure
would do it, as it produced a ruby
binary that reports as Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
This ruby
works correctly on the Intel machine I built in on, but does not work when copied to an Apple Silicon device. The reverse, however, seems to work. That is, if I build the universal ruby on an Apple Silicon machine, the ruby
binary that's built seems to work correctly on both Intel and Apple Silicon machines.
Intel:
$ ./ruby -v
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [universal.x86_64-darwin21]
Apple Silicon:
$ ./ruby -v
Segmentation fault: 11
$ lldb ./ruby
(lldb) target create "./ruby"
Current executable set to '/Users/crc/ruby' (arm64).
(lldb) run
Process 77071 launched: '/Users/crc/ruby' (arm64)
Process 77071 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
ruby`ruby_vm_special_exception_copy:
-> 0x1002176b8 <+16>: ldr x0, [x0, #0x8]
0x1002176bc <+20>: bl 0x10011fed8 ; rb_class_real
0x1002176c0 <+24>: bl 0x10012070c ; rb_obj_alloc
0x1002176c4 <+28>: mov x20, x0
Target 0: (ruby) stopped.
(lldb) ^D
I also attempted the same thing with ruby 2.7.4 source, with the same result.
Updated by nobu (Nobuyoshi Nakada) about 1 year ago
Could you try with the master, and show more backtraces?
Updated by ccaviness (Clay Caviness) about 1 year ago
nobu (Nobuyoshi Nakada) wrote in #note-1:
Could you try with the master, and show more backtraces?
Sure. Similar error, though this time running the universal ruby
on Apple Silicon just results in a Killed: 9
message. I'm unable to run this binary under lldb
; however, I'm not familiar with debuggers so if there's a different method you'd like me to try I'd be happy to. I did get a backtrace for the segfault on the v3_0_2
build.
ruby
built on an Intel machine, from master
, running my Apple Silicon device:
$ file ruby
ruby: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
ruby (for architecture x86_64): Mach-O 64-bit executable x86_64
ruby (for architecture arm64): Mach-O 64-bit executable arm64
$ ./ruby -v
Killed: 9
$ lldb ./ruby
(lldb) target create "./ruby"
Killed: 9
ruby
built on an Intel machine, from v3_0_2
, running my Apple Silicon device:
$ lldb ruby
(lldb) target create "ruby"
Current executable set to '/Users/crc/ruby' (arm64).
(lldb) run
Process 38054 launched: '/Users/crc/ruby' (arm64)
Process 38054 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
ruby`ruby_vm_special_exception_copy:
-> 0x1002176b8 <+16>: ldr x0, [x0, #0x8]
0x1002176bc <+20>: bl 0x10011fed8 ; rb_class_real
0x1002176c0 <+24>: bl 0x10012070c ; rb_obj_alloc
0x1002176c4 <+28>: mov x20, x0
Target 0: (ruby) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
* frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
frame #1: 0x0000000100217788 ruby`ec_stack_overflow + 56
frame #2: 0x0000000100217708 ruby`rb_ec_stack_overflow + 40
frame #3: 0x000000010023da90 ruby`rb_call0 + 1828
frame #4: 0x00000001001213bc ruby`rb_class_new_instance + 88
frame #5: 0x000000010008a6d8 ruby`rb_exc_new_str + 64
frame #6: 0x000000010022fee4 ruby`rb_vm_register_special_exception_str + 52
frame #7: 0x00000001000966cc ruby`Init_eval + 768
frame #8: 0x00000001000c4d34 ruby`rb_call_inits + 72
frame #9: 0x0000000100093e58 ruby`ruby_setup + 316
frame #10: 0x0000000100093ee0 ruby`ruby_init + 12
frame #11: 0x0000000100001be4 ruby`main + 76
frame #12: 0x00000001003fd0f4 dyld`start + 520
(lldb)
Updated by timsutton (Tim Sutton) about 1 year ago
I have been hoping to do the same operation here for my org, as a way to distribute a universal Ruby binary that would be usable on both Intel and Apple Silicon machines, and to be able to build it on Intel. I seem to run into the same problem when building on Intel.
Updated by ecnelises (Chaofan Qiu) about 1 year ago
Can you please try codesign -s - ruby
? Because Apple's arm chip requires the exectutables signed.
I encountered the same killed 9 error elsewhere, FYI: https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-11/msg01480.html
Updated by timsutton (Tim Sutton) about 1 year ago
Sure. I had suspected that at some point so I checked the signature using codesign -dvvvvv
. But I also just repeated that test, and then replaced the built binary with a new ad-hoc signature on the M1. That unfortunately seemed to not help:
# intel-built universal binary copied over
tsutton@tim-m1 ~ % cp /Volumes/ssd/ruby_274 .
tsutton@tim-m1 ~ % codesign -d -vvvvv ruby_274
Executable=/Users/tsutton/ruby_274
Identifier=-5ac6e2.out
Format=Mach-O universal (x86_64 arm64)
CodeDirectory v=20400 size=30020 flags=0x20002(adhoc,linker-signed) hashes=935+0 location=embedded
VersionPlatform=1
VersionMin=720896
VersionSDK=721664
Hash type=sha256 size=32
CandidateCDHash sha256=63eda95634ac1d1ea6c97467085ec887b45f1dde
CandidateCDHashFull sha256=63eda95634ac1d1ea6c97467085ec887b45f1dde4659262d661eccca13ba17ca
Hash choices=sha256
CMSDigest=63eda95634ac1d1ea6c97467085ec887b45f1dde4659262d661eccca13ba17ca
CMSDigestType=2
Executable Segment base=0
Executable Segment limit=2834432
Executable Segment flags=0x1
Page size=4096
CDHash=63eda95634ac1d1ea6c97467085ec887b45f1dde
Signature=adhoc
Info.plist=not bound
TeamIdentifier=not set
Sealed Resources=none
Internal requirements=none
tsutton@tim-m1 ~ % ./ruby_274
zsh: segmentation fault ./ruby_274
tsutton@tim-m1 ~ % cp ruby_274 ruby_274_copy
tsutton@tim-m1 ~ % codesign -s - ruby_274_copy
tsutton@tim-m1 ~ % ./ruby_274_copy
zsh: segmentation fault ./ruby_274_copy
# using -f to force signature replacement
tsutton@tim-m1 ~ % codesign -fs - ruby_274_copy
ruby_274_copy: replacing existing signature
tsutton@tim-m1 ~ % ./ruby_274_copy
zsh: segmentation fault ./ruby_274_copy
Updated by ccaviness (Clay Caviness) about 1 year ago
Lack of codesigning on Apple Silicon is an excellent guess, but unfortunately does not seem to be the cause here as Tim's demonstrated above (and I've verified as well). I first noticed this issue when testing a ruby
that was fully signed with a public developer cert.
Updated by ccaviness (Clay Caviness) 8 months ago
I don't believe any of those bugs are related.
My suspicion is that, when building on x86 and targeting universal, during configure
for cross-compilation on arm64 the small test binaries that built cannot be executed on x86, leading to the various hints about the host machine to be wildly incorrect.
When building on arm64 and targeting universal, these test binaries that are built for x86 can actually run on the arm64 machine successfully, due to the Rosetta x86 compatibility layer.
There is no mechanism to run arm64 binaries on x86 Macs, though, so I think to get cross-compilation working on x86 many of the various autoconf hints will need to be manually set.
I'm not that familiar with autoconf or what these values should be, though.