Project

General

Profile

Bug #21127

Updated by pocke (Masataka Kuwabara) 3 months ago

Previously, Ruby displayed backtraces for each thread on deadlock. However, it has not been shown since Ruby 3.0. 
 It should display the backtrace for debugging. (related: https://bugs.ruby-lang.org/issues/8214) 

 ## Step to reproduce 

 ```ruby 
 def f(q) g(q) end 
 def g(q) h(q) end 
 def h(q) q.pop end 

 q = Thread::Queue.new 
 th1 = Thread.new { f q } 
 th2 = Thread.new { f q } 

 th1.join 
 th2.join 
 ``` 

 Ruby 2.7.8: It displays the trace. 

 ``` 
 ruby-2.7.8 ruby 2.7.8p225 (2023-03-30 revision 1f4d455848) [x86_64-linux] 
            -e:9:in `join': No live threads left. Deadlock? (fatal) 
            3 threads, 3 sleeps current:0x0000004000653e80 main thread:0x000000400035da50 
            * #<Thread:0x000000400038ad70 sleep_forever> 
               rb_thread_t:0x000000400035da50 native:0x0000004002fb7740 int:0 
               -e:9:in `join' 
               -e:9:in `<main>' 
            * #<Thread:0x00000040004fe760 -e:6 sleep_forever> 
               rb_thread_t:0x00000040004518a0 native:0x00000040067ae700 int:0 
                depended by: tb_thread_id:0x000000400035da50 
               -e:3:in `pop' 
               -e:3:in `h' 
               -e:2:in `g' 
               -e:1:in `f' 
               -e:6:in `block in <main>' 
            * #<Thread:0x00000040004fe620 -e:7 sleep_forever> 
               rb_thread_t:0x0000004000653e80 native:0x000000400e9af700 int:0 
               -e:3:in `pop' 
               -e:3:in `h' 
               -e:2:in `g' 
               -e:1:in `f' 
               -e:7:in `block in <main>' 
                 from -e:9:in `<main>' 
 ``` 


 Ruby 3.0.0: There is no trace. 

 ``` 
 ruby-3.0.0 ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux] 
            -e:9:in `join': No live threads left. Deadlock? (fatal) 
            3 threads, 3 sleeps current:0x00000040006e88c0 main thread:0x00000040003ffa10 
            * #<Thread:0x000000400046f448 sleep_forever> 
               rb_thread_t:0x00000040003ffa10 native:0x0000004003059740 int:0 
              
            * #<Thread:0x000000400073c2e8 -e:6 sleep_forever> 
               rb_thread_t:0x00000040006e15a0 native:0x0000004006850700 int:0 
                depended by: tb_thread_id:0x00000040003ffa10 
              
            * #<Thread:0x000000400073c1f8 -e:7 sleep_forever> 
               rb_thread_t:0x00000040006e88c0 native:0x000000400ea51700 int:0 
              
                 from -e:9:in `<main>' 
 ``` 


 Ruby 3.5.0.dev (latest): There is no trace. 


 ``` 
 ruby 3.5.0dev (2025-02-10T00:27:18Z master 135479a58d) +PRISM [arm64-darwin24] 
 -e:9:in 'Thread#join': No live threads left. Deadlock? (fatal) 
 3 threads, 3 sleeps current:0x000000012ae3b530 main thread:0x000000012af04080 
 * #<Thread:0x0000000104aaaf78 sleep_forever> 
    rb_thread_t:0x000000012af04080 native:0x00000001e9d93240 int:0 
   
 * #<Thread:0x0000000104e5e048 -e:6 sleep_forever> 
    rb_thread_t:0x000000012ae23a90 native:0x000000016b5f3000 int:0 
     depended by: tb_thread_id:0x000000012af04080 
   
 * #<Thread:0x0000000104e5deb8 -e:7 sleep_forever> 
    rb_thread_t:0x000000012ae3b530 native:0x000000016b6ff000 int:0 
   

         from -e:9:in '<main>' 
 ``` 


 Note: The results of 2.7 and 3.0 are indented because I executed them with all-ruby. 


 ## Cause 


 This commit has introduced this change. https://github.com/ruby/ruby/commit/3b24b7914c16930bfadc89d6aff6326a51c54295 
 The ticket is https://bugs.ruby-lang.org/issues/17031 

 It changes `rb_ec_backtrace_str_ary` function in `vm_backtrace.c`. Previously, `rb_ec_backtrace_str_ary(ec, 0, 0)` returns all traces, but it returns an empty array with this change. It should pass `ALL_BACKTRACE_LINES` (`-1`) to get all traces. 


 I used rubyfarm also to confirm that this commit was the cause. 


 With the commit, it does not display the traces. 
 ``` 
 docker run --rm -ti rubylang/rubyfarm:3b24b7914c16930bfadc89d6aff6326a51c54295 ruby -ve 'def f(q) g(q) end 
 def g(q) h(q) end 
 def h(q) q.pop end 

 q = Thread::Queue.new 
 th1 = Thread.new { f q } 
 th2 = Thread.new { f q } 

 th1.join 
 th2.join' 

 ruby 2.8.0dev (2020-08-27T22:17:36Z :detached: 3b24b7914c) [x86_64-linux] 
 -e:9:in `join': No live threads left. Deadlock? (fatal) 
 3 threads, 3 sleeps current:0x0000004000566a80 main thread:0x0000004000202a60 
 * #<Thread:0x000000400026faf8 sleep_forever> 
    rb_thread_t:0x0000004000202a60 native:0x000000400440d740 int:0 
   
 * #<Thread:0x00000040004f1768 -e:6 sleep_forever> 
    rb_thread_t:0x0000004000566710 native:0x00000040081fe700 int:0 
     depended by: tb_thread_id:0x0000004000202a60 
   
 * #<Thread:0x00000040004f1600 -e:7 sleep_forever> 
    rb_thread_t:0x0000004000566a80 native:0x00000040103ff700 int:0 
   
         from -e:9:in `<main>' 
 ``` 


 With the parent commit, it displays the traces. 
 ``` 
 docker run --rm -ti rubylang/rubyfarm:8095114f1715070fcdc2b29303dcf55a7fcc32a3 ruby -ve 'def f(q) g(q) end 
 def g(q) h(q) end  
 def h(q) q.pop end 

 q = Thread::Queue.new    
 th1 = Thread.new { f q } 
 th2 = Thread.new { f q } 

 th1.join  
 th2.join' 

 ruby 2.8.0dev (2020-08-27T19:54:36Z :detached: 8095114f17) [x86_64-linux] 
 -e:9:in `join': No live threads left. Deadlock? (fatal) 
 3 threads, 3 sleeps current:0x000000400055b3a0 main thread:0x0000004000202a60 
 * #<Thread:0x000000400026faf8 sleep_forever> 
    rb_thread_t:0x0000004000202a60 native:0x000000400440d740 int:0 
    -e:9:in `join' 
    -e:9:in `<main>' 
 * #<Thread:0x00000040004f1768 -e:6 sleep_forever> 
    rb_thread_t:0x000000400055afe0 native:0x00000040081fe700 int:0 
     depended by: tb_thread_id:0x0000004000202a60 
    -e:3:in `pop' 
    -e:3:in `h' 
    -e:2:in `g' 
    -e:1:in `f' 
    -e:6:in `block in <main>' 
 * #<Thread:0x00000040004f1600 -e:7 sleep_forever> 
    rb_thread_t:0x000000400055b3a0 native:0x00000040103ff700 int:0 
    -e:3:in `pop' 
    -e:3:in `h' 
    -e:2:in `g' 
    -e:1:in `f' 
    -e:7:in `block in <main>' 
         from -e:9:in `<main>' 
 ``` 


 I think it's not an intentional change. The ticket and NEWS do not mention the backtrace on a deadlock. (And the backtrace is really helpful!) 


 ## Solution 


 The following patch fixes the problem. PR: https://github.com/ruby/ruby/pull/12721 I'll make a PR on GitHub soon. 

 ```diff 
 diff --git a/eval_intern.h b/eval_intern.h 
 index 9a551120ff..f783c93aff 100644 
 --- a/eval_intern.h 
 +++ b/eval_intern.h 
 @@ -300,6 +300,8 @@ void rb_vm_set_progname(VALUE filename); 
  VALUE rb_vm_cbase(void); 
 
  /* vm_backtrace.c */ 
 +#define BACKTRACE_START 0 
 +#define ALL_BACKTRACE_LINES -1 
  VALUE rb_ec_backtrace_object(const rb_execution_context_t *ec); 
  VALUE rb_ec_backtrace_str_ary(const rb_execution_context_t *ec, long lev, long n); 
  VALUE rb_ec_backtrace_location_ary(const rb_execution_context_t *ec, long lev, long n, bool skip_internal); 
 diff --git a/thread.c b/thread.c 
 index d40d5140b7..5280c667af 100644 
 --- a/thread.c 
 +++ b/thread.c 
 @@ -5596,7 +5596,7 @@ debug_deadlock_check(rb_ractor_t *r, VALUE msg) 
              } 
          } 
          rb_str_catf(msg, "\n     "); 
 -          rb_str_concat(msg, rb_ary_join(rb_ec_backtrace_str_ary(th->ec, 0, 0), sep)); 
 +          rb_str_concat(msg, rb_ary_join(rb_ec_backtrace_str_ary(th->ec, BACKTRACE_START, ALL_BACKTRACE_LINES), sep)); 
          rb_str_catf(msg, "\n"); 
      } 
  } 
 diff --git a/vm_backtrace.c b/vm_backtrace.c 
 index 56a0d861d6..6f2cf9e169 100644 
 --- a/vm_backtrace.c 
 +++ b/vm_backtrace.c 
 @@ -31,9 +31,6 @@ id2str(ID id) 
  } 
  #define rb_id2str(id) id2str(id) 
 
 -#define BACKTRACE_START 0 
 -#define ALL_BACKTRACE_LINES -1 
 - 
  inline static int 
  calc_pos(const rb_iseq_t *iseq, const VALUE *pc, int *lineno, int *node_id) 
  { 
 ``` 

 The result is here: 

 ``` 
 $ ruby -ve 'def f(q) g(q) end 
 def g(q) h(q) end 
 def h(q) q.pop end 

 q = Thread::Queue.new 
 th1 = Thread.new { f q } 
 th2 = Thread.new { f q } 

 th1.join 
 th2.join' 

 ruby 3.5.0dev (2025-02-10T00:27:18Z origin/master 135479a58d) +PRISM [arm64-darwin24] 
 -e:9:in 'Thread#join': No live threads left. Deadlock? (fatal) 
 3 threads, 3 sleeps current:0x0000000157a053a0 main thread:0x0000000156704080 
 * #<Thread:0x000000011bcfaf78 sleep_forever> 
    rb_thread_t:0x0000000156704080 native:0x00000001e9d93240 int:0 
    -e:9:in 'Thread#join' 
    -e:9:in '<main>' 
 * #<Thread:0x000000011d2d0bb8 -e:6 sleep_forever> 
    rb_thread_t:0x0000000157a04d40 native:0x000000016dcb7000 int:0 
     depended by: tb_thread_id:0x0000000156704080 
    <internal:thread_sync>:18:in 'Thread::Queue#pop' 
    -e:3:in 'Object#h' 
    -e:2:in 'Object#g' 
    -e:1:in 'Object#f' 
    -e:6:in 'block in <main>' 
 * #<Thread:0x000000011d2d07a8 -e:7 sleep_forever> 
    rb_thread_t:0x0000000157a053a0 native:0x000000016ddc3000 int:0 
    <internal:thread_sync>:18:in 'Thread::Queue#pop' 
    -e:3:in 'Object#h' 
    -e:2:in 'Object#g' 
    -e:1:in 'Object#f' 
    -e:7:in 'block in <main>' 

	 from -e:9:in '<main>' 

 ```

Back