Project

General

Profile

Actions

Bug #18286

open

Universal arm64/x86_84 binary built on an x86_64 machine segfaults/is killed on arm64

Added by ccaviness (Clay Caviness) 3 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:105920]

Description

A universal arm64/x86_84 ruby binary for macOS built on a x86_64 machine segfaults/is killed when executed on an arm64 machine.

To reproduce:

  • On an Intel Mac: git clone https://github.com/ruby/ruby && cd ruby && git checkout v3_0_2 && ./autogen.sh && ./configure --with-arch=arm64,x86_64 && make -j$(sysctl -n hw.ncpu)
  • Copy the built ./ruby binary to an Apple Silicon machine
  • Attempt to execute it

Expected:
The universal ruby binary works correctly on both devices

Actual:
The universal ruby binary crashes with either Segmentation fault: 11 or Killed: 9 (this seems to occur if arm64e is used instead of arm64).

Details:
I'm attempting to build a universal Ruby for macOS that will run on both Intel (x86_64) and Apple Silicon (arm64) machines.

It seemed initially that this was as easy as adding --with-arch=arm64,x86_64 to ./configure would do it, as it produced a ruby binary that reports as Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]

This ruby works correctly on the Intel machine I built in on, but does not work when copied to an Apple Silicon device. The reverse, however, seems to work. That is, if I build the universal ruby on an Apple Silicon machine, the ruby binary that's built seems to work correctly on both Intel and Apple Silicon machines.

Intel:

$ ./ruby -v
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [universal.x86_64-darwin21]

Apple Silicon:

$ ./ruby -v
Segmentation fault: 11
$ lldb ./ruby
(lldb) target create "./ruby"
Current executable set to '/Users/crc/ruby' (arm64).
(lldb) run
Process 77071 launched: '/Users/crc/ruby' (arm64)
Process 77071 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
ruby`ruby_vm_special_exception_copy:
->  0x1002176b8 <+16>: ldr    x0, [x0, #0x8]
    0x1002176bc <+20>: bl     0x10011fed8               ; rb_class_real
    0x1002176c0 <+24>: bl     0x10012070c               ; rb_obj_alloc
    0x1002176c4 <+28>: mov    x20, x0
Target 0: (ruby) stopped.
(lldb) ^D

I also attempted the same thing with ruby 2.7.4 source, with the same result.

Updated by nobu (Nobuyoshi Nakada) 3 months ago

Could you try with the master, and show more backtraces?

Updated by ccaviness (Clay Caviness) 3 months ago

nobu (Nobuyoshi Nakada) wrote in #note-1:

Could you try with the master, and show more backtraces?

Sure. Similar error, though this time running the universal ruby on Apple Silicon just results in a Killed: 9 message. I'm unable to run this binary under lldb; however, I'm not familiar with debuggers so if there's a different method you'd like me to try I'd be happy to. I did get a backtrace for the segfault on the v3_0_2 build.

ruby built on an Intel machine, from master, running my Apple Silicon device:

$ file ruby 
ruby: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
ruby (for architecture x86_64): Mach-O 64-bit executable x86_64
ruby (for architecture arm64):  Mach-O 64-bit executable arm64
$ ./ruby -v
Killed: 9
$ lldb ./ruby
(lldb) target create "./ruby"
Killed: 9

ruby built on an Intel machine, from v3_0_2, running my Apple Silicon device:

$ lldb ruby
(lldb) target create "ruby"
Current executable set to '/Users/crc/ruby' (arm64).
(lldb) run
Process 38054 launched: '/Users/crc/ruby' (arm64)
Process 38054 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
ruby`ruby_vm_special_exception_copy:
->  0x1002176b8 <+16>: ldr    x0, [x0, #0x8]
    0x1002176bc <+20>: bl     0x10011fed8               ; rb_class_real
    0x1002176c0 <+24>: bl     0x10012070c               ; rb_obj_alloc
    0x1002176c4 <+28>: mov    x20, x0
Target 0: (ruby) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x00000001002176b8 ruby`ruby_vm_special_exception_copy + 16
    frame #1: 0x0000000100217788 ruby`ec_stack_overflow + 56
    frame #2: 0x0000000100217708 ruby`rb_ec_stack_overflow + 40
    frame #3: 0x000000010023da90 ruby`rb_call0 + 1828
    frame #4: 0x00000001001213bc ruby`rb_class_new_instance + 88
    frame #5: 0x000000010008a6d8 ruby`rb_exc_new_str + 64
    frame #6: 0x000000010022fee4 ruby`rb_vm_register_special_exception_str + 52
    frame #7: 0x00000001000966cc ruby`Init_eval + 768
    frame #8: 0x00000001000c4d34 ruby`rb_call_inits + 72
    frame #9: 0x0000000100093e58 ruby`ruby_setup + 316
    frame #10: 0x0000000100093ee0 ruby`ruby_init + 12
    frame #11: 0x0000000100001be4 ruby`main + 76
    frame #12: 0x00000001003fd0f4 dyld`start + 520
(lldb) 

Updated by timsutton (Tim Sutton) 3 months ago

I have been hoping to do the same operation here for my org, as a way to distribute a universal Ruby binary that would be usable on both Intel and Apple Silicon machines, and to be able to build it on Intel. I seem to run into the same problem when building on Intel.

Updated by ecnelises (Chaofan Qiu) 3 months ago

Can you please try codesign -s - ruby? Because Apple's arm chip requires the exectutables signed.

I encountered the same killed 9 error elsewhere, FYI: https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-11/msg01480.html

Updated by timsutton (Tim Sutton) 3 months ago

Sure. I had suspected that at some point so I checked the signature using codesign -dvvvvv. But I also just repeated that test, and then replaced the built binary with a new ad-hoc signature on the M1. That unfortunately seemed to not help:

# intel-built universal binary copied over
tsutton@tim-m1 ~ % cp /Volumes/ssd/ruby_274 .

tsutton@tim-m1 ~ % codesign -d -vvvvv ruby_274 
Executable=/Users/tsutton/ruby_274
Identifier=-5ac6e2.out
Format=Mach-O universal (x86_64 arm64)
CodeDirectory v=20400 size=30020 flags=0x20002(adhoc,linker-signed) hashes=935+0 location=embedded
VersionPlatform=1
VersionMin=720896
VersionSDK=721664
Hash type=sha256 size=32
CandidateCDHash sha256=63eda95634ac1d1ea6c97467085ec887b45f1dde
CandidateCDHashFull sha256=63eda95634ac1d1ea6c97467085ec887b45f1dde4659262d661eccca13ba17ca
Hash choices=sha256
CMSDigest=63eda95634ac1d1ea6c97467085ec887b45f1dde4659262d661eccca13ba17ca
CMSDigestType=2
Executable Segment base=0
Executable Segment limit=2834432
Executable Segment flags=0x1
Page size=4096
CDHash=63eda95634ac1d1ea6c97467085ec887b45f1dde
Signature=adhoc
Info.plist=not bound
TeamIdentifier=not set
Sealed Resources=none
Internal requirements=none

tsutton@tim-m1 ~ % ./ruby_274 
zsh: segmentation fault  ./ruby_274

tsutton@tim-m1 ~ % cp ruby_274 ruby_274_copy

tsutton@tim-m1 ~ % codesign -s - ruby_274_copy 

tsutton@tim-m1 ~ % ./ruby_274_copy  
zsh: segmentation fault  ./ruby_274_copy

# using -f to force signature replacement
tsutton@tim-m1 ~ % codesign -fs - ruby_274_copy
ruby_274_copy: replacing existing signature

tsutton@tim-m1 ~ % ./ruby_274_copy             
zsh: segmentation fault  ./ruby_274_copy

Updated by ccaviness (Clay Caviness) 3 months ago

Lack of codesigning on Apple Silicon is an excellent guess, but unfortunately does not seem to be the cause here as Tim's demonstrated above (and I've verified as well). I first noticed this issue when testing a ruby that was fully signed with a public developer cert.

Actions

Also available in: Atom PDF