socket.rb - recurring segmentation faults
With Ruby 2.5.3p105 and now with Ruby 2.6.0 following our recent upgrade, we are sadly still seeing reasonably frequent segmentation faults from Ruby, specifically within socket.rb
Looking in socket.rb, it seems it's related to the address lookup:
Addrinfo.getaddrinfo(nodename, service, family, socktype, protocol, flags).each(&block)
Segfault report below in full. Attached are diagnostic reports too. If there is anything I can do to help reproduce I will, however sadly I have never been able to reproduce reliably, yet sadly it happens once every few days.
Updated by nobu (Nobuyoshi Nakada) 4 months ago
Always it happens here, though I couldn't find the source of
si_destination_compare, it may be a problem in libsystem_info.dylib.
7 ??? 0x00007fc6cddeaac0 0 + 140491834174144 8 libsystem_trace.dylib 0x00007fff6e31adb4 os_log_type_enabled + 627 9 libsystem_info.dylib 0x00007fff6e23305b si_destination_compare_statistics + 1659 10 libsystem_info.dylib 0x00007fff6e231bf3 si_destination_compare_internal + 707 11 libsystem_info.dylib 0x00007fff6e231762 si_destination_compare + 530 12 libsystem_info.dylib 0x00007fff6e20f95f _gai_addr_sort + 111 13 libsystem_c.dylib 0x00007fff6e1b9a0f _isort + 193 14 libsystem_c.dylib 0x00007fff6e1b993c _qsort + 2159 15 libsystem_info.dylib 0x00007fff6e207135 _gai_sort_list + 789 16 libsystem_info.dylib 0x00007fff6e205b88 si_addrinfo + 2040 17 libsystem_info.dylib 0x00007fff6e205262 _getaddrinfo_internal + 242 18 libsystem_info.dylib 0x00007fff6e20515d getaddrinfo + 61
Updated by jessebs (Jesse Bowes) 3 months ago
I have run into a similar issue using Ruby 2.5.1 but unfortunately don't have an easy way to reproduce.
A couple of things that help mitigate it (and may be useful for finding the actual issue):
getaddrinfo is in the backtrace and this is happening around some network code for me. I found that using an IP address instead of hostname makes the issue go away.
Another option that I have found is that around the code giving problems, turning off Garbage Collection will make it go away as well (GC.disable).
Updated by zormandi (Zoltan Ormandi) about 1 month ago
We're seeing this issue as well, on Ruby 2.6.1. For us, it occurs towards the end of a fairly large test suite when running one of our legacy Cucumber tests. When we only run the Cucumber section of our test suite (not the whole thing) then the issue does not occur. Also, it does not happen on our CI server which makes me suspect that this might be an OSX-exclusive problem - we're only seeing it on our Macbooks.
The test that triggers the crash starts up a fake web server using WEBrick to simulate one of our services. It binds to 'http://localhost:42638' but the suggestion of using an IP address instead of a hostname didn't solve the problem for us; it still occurs if we change the binding to 'http://127.0.0.1:42638'.
Let me know if there's any information that could help (other than a reproduce script, which I obviously cannot provide) - it would be great to get rid of this bug.
Unfortunately, I was wrong. The issue does sometimes occur even when only the Cucumber section of our test suite is being executed. Also, turning off the GC didn't help either.