Bug #15490
closedsocket.rb - recurring segmentation faults
Description
With Ruby 2.5.3p105 and now with Ruby 2.6.0 following our recent upgrade, we are sadly still seeing reasonably frequent segmentation faults from Ruby, specifically within socket.rb
Looking in socket.rb, it seems it's related to the address lookup:
Addrinfo.getaddrinfo(nodename, service, family, socktype, protocol, flags).each(&block)
Segfault report below in full. Attached are diagnostic reports too. If there is anything I can do to help reproduce I will, however sadly I have never been able to reproduce reliably, yet sadly it happens once every few days.
Files
        
           Updated by nobu (Nobuyoshi Nakada) almost 7 years ago
          Updated by nobu (Nobuyoshi Nakada) almost 7 years ago
          
          
        
        
      
      - File bug-15490.log bug-15490.log added
- Description updated (diff)
Always it happens here, though I couldn't find the source of si_destination_compare, it may be a problem in libsystem_info.dylib.
7   ???                           	0x00007fc6cddeaac0 0 + 140491834174144
8   libsystem_trace.dylib         	0x00007fff6e31adb4 os_log_type_enabled + 627
9   libsystem_info.dylib          	0x00007fff6e23305b si_destination_compare_statistics + 1659
10  libsystem_info.dylib          	0x00007fff6e231bf3 si_destination_compare_internal + 707
11  libsystem_info.dylib          	0x00007fff6e231762 si_destination_compare + 530
12  libsystem_info.dylib          	0x00007fff6e20f95f _gai_addr_sort + 111
13  libsystem_c.dylib             	0x00007fff6e1b9a0f _isort + 193
14  libsystem_c.dylib             	0x00007fff6e1b993c _qsort + 2159
15  libsystem_info.dylib          	0x00007fff6e207135 _gai_sort_list + 789
16  libsystem_info.dylib          	0x00007fff6e205b88 si_addrinfo + 2040
17  libsystem_info.dylib          	0x00007fff6e205262 _getaddrinfo_internal + 242
18  libsystem_info.dylib          	0x00007fff6e20515d getaddrinfo + 61
        
           Updated by matthew.oriordan (Matthew O'Riordan) almost 7 years ago
          Updated by matthew.oriordan (Matthew O'Riordan) almost 7 years ago
          
          
        
        
      
      Is there something I can do to help with the source of si_destination_compare, and the problem you believe is related to libsystem_info.dylib?
        
           Updated by jessebs (Jesse Bowes) over 6 years ago
          Updated by jessebs (Jesse Bowes) over 6 years ago
          
          
        
        
      
      I have run into a similar issue using Ruby 2.5.1 but unfortunately don't have an easy way to reproduce.
A couple of things that help mitigate it (and may be useful for finding the actual issue):
getaddrinfo is in the backtrace and this is happening around some network code for me. I found that using an IP address instead of hostname makes the issue go away.
Another option that I have found is that around the code giving problems, turning off Garbage Collection will make it go away as well (GC.disable).
        
           Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          
          
        
        
      
      - Has duplicate Bug #15639: [BUG] Segmentation fault at 0x000000010e82ca3a added
        
           Updated by zormandi (Zoltan Ormandi) over 6 years ago
          Updated by zormandi (Zoltan Ormandi) over 6 years ago
          
          
        
        
      
      We're seeing this issue as well, on Ruby 2.6.1. For us, it occurs towards the end of a fairly large test suite when running one of our legacy Cucumber tests. When we only run the Cucumber section of our test suite (not the whole thing) then the issue does not occur. Also, it does not happen on our CI server which makes me suspect that this might be an OSX-exclusive problem - we're only seeing it on our Macbooks.
The test that triggers the crash starts up a fake web server using WEBrick to simulate one of our services. It binds to 'http://localhost:42638' but the suggestion of using an IP address instead of a hostname didn't solve the problem for us; it still occurs if we change the binding to 'http://127.0.0.1:42638'.
Let me know if there's any information that could help (other than a reproduce script, which I obviously cannot provide) - it would be great to get rid of this bug.
UPDATE
Unfortunately, I was wrong. The issue does sometimes occur even when only the Cucumber section of our test suite is being executed. Also, turning off the GC didn't help either.
        
           Updated by PikachuEXE (Pikachu EXE) over 6 years ago
          Updated by PikachuEXE (Pikachu EXE) over 6 years ago
          
          
        
        
      
      I might got a similar issue with 2.6.2 (also crash at os_log_type_enabled + 627)
https://bugs.ruby-lang.org/issues/15623#note-2
See update #2
        
           Updated by matthew.oriordan (Matthew O'Riordan) over 6 years ago
          Updated by matthew.oriordan (Matthew O'Riordan) over 6 years ago
          
          
        
        
      
      - ruby -v changed from ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18] to ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
This issue is still happening with the latest version of Ruby 2.6.3. Happy to provide more logs / run tests if I can help.
        
           Updated by matthew.oriordan (Matthew O'Riordan) over 6 years ago
          Updated by matthew.oriordan (Matthew O'Riordan) over 6 years ago
          
          
        
        
      
      Some background to how I have worked around this for now, which may be useful.
I use the parallel gem https://github.com/grosser/parallel, which can parallelise tasks using threads of processes.  When switching from processes to threads, this issue has gone away.  In some code baths with a CLI we use locally, processes are preferable given the isolation from the running code, however in this case it was not an issue to use threads and arguably also better from a resource perspective.
        
           Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
          Updated by jeremyevans0 (Jeremy Evans) over 6 years ago
          
          
        
        
      
      - Related to Bug #13646: Segmentation fault with postgresql_adapter in Rails added
        
           Updated by mylesgearon (Myles Gearon) over 6 years ago
          Updated by mylesgearon (Myles Gearon) over 6 years ago
          
          
        
        
      
      - ruby -v changed from ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18] to ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin18], ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18], ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
I have been experiencing this issue as well, but only on a computer running OSX 10.14.5. I can't seem to recreate this on linux using Fedora 29 or Ubuntu 18.04.
Switching the OSX over to 127.0.0.1 instead of localhost seems to crash less? But I'm still getting the segfault there. The segfault happens on 2.5.0 and 2.6.3 for OSX.
        
           Updated by nobu (Nobuyoshi Nakada) about 6 years ago
          Updated by nobu (Nobuyoshi Nakada) about 6 years ago
          
          
        
        
      
      - Status changed from Open to Third Party's Issue
        
           Updated by mame (Yusuke Endoh) about 6 years ago
          Updated by mame (Yusuke Endoh) about 6 years ago
          
          
        
        
      
      - Has duplicate Bug #16036: workers keep on crashing with [BUG] Segmentation fault at 0x000000010b647a3a added
        
           Updated by nobu (Nobuyoshi Nakada) about 6 years ago
          Updated by nobu (Nobuyoshi Nakada) about 6 years ago
          
          
        
        
      
      - Has duplicate Bug #16085: when running rspec I get this issue added
        
           Updated by a_bicky (Takeshi Arabiki) about 6 years ago
          Updated by a_bicky (Takeshi Arabiki) about 6 years ago
          
          
        
        
      
      I got a similar issue.
I uploaded a reproducible code and a crash report.
https://gist.github.com/abicky/1263fd5c7d39db257f663382970bc2b0
I hope they help.