Project

General

Profile

Bug #12298

Indeterministic ruby behavior when another thread is killed

Added by rupert (Robert Pankowecki) about 1 year ago. Updated about 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-linux]
[ruby-core:75008]

Description

#!ruby
require 'securerandom'
class MyThread < ::Thread; end

def delay
  15
end

def run
  loop { work }
rescue Exception => e
  puts "#{Time.now} Exception"
ensure
  puts "#{Time.now} stopping agent"
end


def work
  puts "#{Time.now} start work"
  10_000_000.times { SecureRandom.hex }
  puts "finished work"
rescue StandardError => e
  puts "#{Time.now} Error"
ensure
  puts "#{Time.now} start sleep"
  sleep(delay)
  puts "#{Time.now} finished sleep"
end

t = MyThread.new{ run }

at_exit do
  puts "#{Time.now} killing thread"
  Thread.kill(t)
  puts "#{Time.now} killed thread"
end

sleep(10)
exit

I tried running this script multiple times in ruby 2.1.10, 2.2.4, 2.3.0 and I get inconsistent behavior. Sometimes the main thread does not wait for the second thread.

2016-04-15 11:07:09 +0200 start work
2016-04-15 11:07:19 +0200 killing thread
2016-04-15 11:07:19 +0200 killed thread
2016-04-15 11:07:19 +0200 stopping agent

And sometimes it does.

2016-04-15 11:07:26 +0200 start work
2016-04-15 11:07:36 +0200 killing thread
2016-04-15 11:07:36 +0200 killed thread
2016-04-15 11:07:36 +0200 start sleep
2016-04-15 11:07:51 +0200 finished sleep
2016-04-15 11:07:51 +0200 stopping agent

I appears that under higher CPU usage the 2nd scenario is more likely. In normal conditions the 1st happens more often probably.

I described the whole story in my blogpost

I am not sure which behavior is ruby default (I assume not waiting for other threads) but sometimes apparently ruby does wait for other threads to finish.

a.txt View - THREAD_DEBUG log a (3.09 KB) drbrain (Eric Hodel), 04/19/2016 06:46 PM

b.txt View - THREAD_DEBUG log b (3.58 KB) drbrain (Eric Hodel), 04/19/2016 06:46 PM

12998.rb View - Reduced test case (508 Bytes) drbrain (Eric Hodel), 04/19/2016 06:46 PM

History

#2 [ruby-core:75029] Updated by drbrain (Eric Hodel) about 1 year ago

This is not a ruby bug. Thread scheduling is inherently non-deterministic.

Sometimes you'll switch to the work thread before reaching rb_thread_terminate_all which allows the ensure to run, sometimes you won't.

I've attached a reduced test case and logs from ruby built with THREAD_DEBUG=1

Honeybadger should probably not sleep in ensure as a way to reduce thrashing when work finishes (I'm guessing that's what they use the sleep for) and should use a different mechanism instead.

#3 [ruby-core:75041] Updated by shevegen (Robert A. Heiler) about 1 year ago

Hmm. Although the report was already rejected, and even if we all may agree that
the honeybadger code was not brilliant, I feel that the overall issue here in
regards to Threads may be useful for more people in the future too.

For instance, without the blog explanation, where else would you find that much
information about ruby code used for "real"? From the official documentation
of Threads?

http://ruby-doc.org/core-2.3.0/Thread.html

The documentation is not bad at all, mind you, but the blog semi-taught me
more than the documentation would.

There may also be small improvements. We have instance methods like:

"See also the instance methods alive? and stop?"

In the code he checked whether the thread was aborting:

Thread.current.status == "aborting" 

This could be simplified if the ruby code would allow
for this check:

Thread.current.aborting?

Or perhaps even

Thread.aborting?

(I do not really know Threads that well that I can suggest an API
that makes sense / is logical.)

Without that block, I would probably have never been able to figure
out that a thread is not just alive or dead but may be in between
the two like the schroedinger cat.

#4 [ruby-core:75043] Updated by rupert (Robert Pankowecki) about 1 year ago

Eric Hodel wrote:

This is not a ruby bug. Thread scheduling is inherently non-deterministic.

Sometimes you'll switch to the work thread before reaching rb_thread_terminate_all which allows the ensure to run, sometimes you won't.

I get that. But why does Ruby wait for the second thread to finish? Or is it that Ruby always waits for those threads to finish, but the Thread gets un-catchable Exception and only runs the ensure blocks and that's what is happening?

#5 [ruby-core:75048] Updated by drbrain (Eric Hodel) about 1 year ago

In ruby_cleanup:

First we mark the main thread as killed: https://github.com/ruby/ruby/blob/trunk/eval.c#L186

Then we kill all threads: https://github.com/ruby/ruby/blob/trunk/eval.c#L191

Then we start at_exit hooks: (via ruby_finalize_1) https://github.com/ruby/ruby/blob/trunk/eval.c#L222

Sometimes ruby switches to the killed thread right away, so it sleeps in the ensure in your test case. It can't switch back to the main thread because it is marked as killed.

Sometimes ruby doesn't switch to the killed thread so it gets two signals and never sleeps (because the second arrives in the ensure when it tries to sleep).

Also available in: Atom PDF