Project

General

Profile

Actions

Bug #15325

closed

Ruby 2.5.3 seg fault after find block returns

Added by stanhu (Stan Hu) about 6 years ago. Updated about 6 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin15]
[ruby-core:89914]

Description

In https://gitlab.com/gitlab-org/gitlab-ce/blob/233af8f1731734aaad7e5055af39f26c16608649/app/services/ci/register_job_service.rb#L48, we see a repeatable seg fault on both MacOS and Ubuntu with Rails 5.0.7 in a development environment. The seg fault appears to occur after the find returns:

builds.find do |build|
        next unless runner.can_pick?(build)

        begin
          # In case when 2 runners try to assign the same build, second runner will be declined
          # with StateMachines::InvalidTransition or StaleObjectError when doing run! or save method.
          if assign_runner!(build, params)
            register_success(build)

            return Result.new(build, true) # <--- SEG FAULT HAPPENS AFTER HERE
          end
        rescue StateMachines::InvalidTransition, ActiveRecord::StaleObjectError

The segfault shows some bad memory access:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x00007fff5d0e8b86 __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff5d19ec50 pthread_kill + 285
2   libsystem_c.dylib                   0x00007fff5d0521c9 abort + 127
3   ruby                                0x000000010f5ec6a9 die + 9
4   ruby                                0x000000010f5ec908 rb_bug_context + 600
5   ruby                                0x000000010f6db7a1 sigsegv + 81
6   libsystem_platform.dylib            0x00007fff5d193b3d _sigtramp + 29
7   ???                                 000000000000000000 0 + 0
8   ruby                                0x000000010f75461e vm_exec + 142
9   ruby                                0x000000010f761f25 invoke_block_from_c_bh + 405
10  ruby                                0x000000010f74f719 rb_yield + 153
11  ruby                                0x000000010f5e33b9 find_i + 41
12  ruby                                0x000000010f7620ca invoke_block_from_c_bh + 826
13  ruby                                0x000000010f74f719 rb_yield + 153
14  ruby                                0x000000010f57cce9 rb_ary_each + 41
15  ruby                                0x000000010f759f51 vm_call_cfunc + 305
16  ruby                                0x000000010f742a0d vm_exec_core + 9149
17  ruby                                0x000000010f75461e vm_exec + 142
18  ruby                                0x000000010f761d41 rb_call0 + 161
19  ruby                                0x000000010f74fe54 iterate_method + 52
20  ruby                                0x000000010f74fd9b rb_iterate0 + 347
21  ruby                                0x000000010f74fe1a rb_block_call + 74
22  ruby                                0x000000010f5e0518 enum_find + 104
23  ruby                                0x000000010f759f51 vm_call_cfunc + 305
24  ruby                                0x000000010f7436bd vm_exec_core + 12397

We do NOT see the problem after downgrading to 2.4.5.


Files

ruby_2018-11-20-132027_jet.crash (66.1 KB) ruby_2018-11-20-132027_jet.crash stanhu (Stan Hu), 11/20/2018 10:03 PM
ruby-2.5.3-segfault.txt (774 KB) ruby-2.5.3-segfault.txt stanhu (Stan Hu), 11/20/2018 10:09 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ruby master - Bug #15105: `rb_debug_inspector_open` breaks lazy proc optimizationClosedko1 (Koichi Sasada)Actions
Actions #1

Updated by stanhu (Stan Hu) about 6 years ago

  • ruby -v set to ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin15]

Updated by stanhu (Stan Hu) about 6 years ago

Note that I've managed to remove the return statement inside the find block, and this appears to make the seg fault go away.

diff --git a/app/services/ci/register_job_service.rb b/app/services/ci/register_job_service.rb
index e06f1c05843..2abc4a67dd6 100644
--- a/app/services/ci/register_job_service.rb
+++ b/app/services/ci/register_job_service.rb
@@ -36,7 +36,7 @@ module Ci
         builds = builds.with_any_tags
       end
 
-      builds.find do |build|
+      selection = builds.find do |build|
         next unless runner.can_pick?(build)
 
         begin
@@ -45,7 +45,7 @@ module Ci
           if assign_runner!(build, params)
             register_success(build)
 
-            return Result.new(build, true) # rubocop:disable Cop/AvoidReturnFromBlocks
+            break build
           end
         rescue StateMachines::InvalidTransition, ActiveRecord::StaleObjectError
           # We are looping to find another build that is not conflicting
@@ -61,6 +61,8 @@ module Ci
         end
       end
 
+      return Result.new(selection, true) if selection
+
       register_failure
       Result.new(nil, valid)
     end
-- 
2.18.1

Updated by stanhu (Stan Hu) about 6 years ago

Something is quite odd. I tried a number of variations:

  1. break build appears to work with Ruby 2.4.5 and 2.5.3.
  2. Instead of break build, use true: In Ruby 2.5.3, this by itself seems to cause selection to be nil. I got a segfault with Ruby 2.4.5 here in the garbage collector (rb_gc_mark_node).
  3. Instead of break build, use break true: selection is nil in both Ruby 2.4.5 and 2.5.3.
  4. Removing the begin/rescue clause entirely and testing this. The below did not work either:
selection = builds.find do |build|
   if assign_runner!(build, params)
     register_success(build)

     true
   else
     false
   end
 end

Updated by stanhu (Stan Hu) about 6 years ago

Ok, I think this bug is caused by https://bugs.ruby-lang.org/issues/15105. We were using the binding_of_caller gem, which calls rb_debug_inspector_open. The seg fault doesn't happen if we omit that call.

Updated by stanhu (Stan Hu) about 6 years ago

We can close this bug report in favor of https://bugs.ruby-lang.org/issues/15105. I've confirmed applying the patch in https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/64800 has made the seg fault go away.

Actions #6

Updated by duerst (Martin Dürst) about 6 years ago

  • Is duplicate of Bug #15105: `rb_debug_inspector_open` breaks lazy proc optimization added

Updated by duerst (Martin Dürst) about 6 years ago

  • Status changed from Open to Closed

stanhu (Stan Hu) wrote:

We can close this bug report in favor of https://bugs.ruby-lang.org/issues/15105. I've confirmed applying the patch in https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/64800 has made the seg fault go away.

Closed at request of original submitter.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0