Project

General

Profile

Actions

Bug #13876

closed

Tempfile's finalizer can be interrupted by a Timeout exception which can cause the process to hang

Added by jrafanie (Joe Rafaniello) almost 4 years ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin15]
[ruby-core:<unknown>]
Tags:

Description

Ruby hangs if a Timeout is raised when a Tempfile's finalizer is run.

This also happens on ruby 2.3.4 on OSX. I could only recreate this on linux once but not reliably. Maybe the finalizer runs at a different and unpredictable time on Linux.

See the following script:

require 'tempfile'
require 'timeout'
Tempfile.new("x")

# this empty method is needed for some reason, maybe scope/GC related
def stuff
  ""
end

2.times do |i|
  begin
    Timeout.timeout(1e-9) do
      loop do
        begin
          "result"
        ensure
        end
      end
    end
  rescue Timeout::Error
  end
end

When run, I get the following output and it hangs ~4 out of 5 times on OSX:

$ ruby -d test.rb
Exception `LoadError' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/rubygems.rb:1345 - cannot load such file -- rubygems/defaults/operating_system
Exception `LoadError' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/rubygems.rb:1354 - cannot load such file -- rubygems/defaults/ruby
removing /var/folders/fq/blrz820d3qz7nm7vj8mbtfs40000gq/T/x20170906-72981-1ttdgpr...
<HANGS>

When it doesn't hang (~1 in 5 tries), it looks like this (the exception seems to be raised after the file cleanup is done):

$ ruby -d test.rb
Exception `LoadError' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/rubygems.rb:1345 - cannot load such file -- rubygems/defaults/operating_system
Exception `LoadError' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/rubygems.rb:1354 - cannot load such file -- rubygems/defaults/ruby
removing /var/folders/fq/blrz820d3qz7nm7vj8mbtfs40000gq/T/x20170906-72963-164cw8j...
done
Exception `Timeout::Error' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/timeout.rb:114 - execution expired
Exception `Timeout::Error' at /Users/joerafaniello/.rubies/ruby-2.4.1/lib/ruby/2.4.0/timeout.rb:114 - execution expired

When it hangs, it seems to be this code:
https://github.com/ruby/ruby/blob/0f25c6d7d59ff9744d1ca5dc908cc12c8ce79e25/lib/tempfile.rb#L253-L261


Related issues

Related to Ruby master - Feature #17849: Fix Timeout.timeout so that it can be used in threaded Web serversOpenmatz (Yukihiro Matsumoto)Actions
Has duplicate Ruby master - Bug #15507: Thread#raise is delivered to GC contextClosedActions

Updated by kernigh (George Koehler) almost 4 years ago

The bug happens when an async error (from Thread#raise) interrupts a finalizer. Ruby seems to rescue errors from finalizers, so the error never reaches the main code. In the script by Joe Rafaniello, the Timeout::Error disappears, so the infinite loop never times out, and the program hangs.

Because of this bug, we can't rely on Timeout in any program that uses ObjectSpace::define_finalizer. We might get unlucky, as the garbage collector might decide to run a finalizer when the Timeout::Error happens. Then the error disappears and our code never times out.

This script reproduces the bug more reliably:

GC.disable
fzer = proc do |id|
  puts "Before sleep #{id}"
  sleep 0.2
  puts "After sleep #{id}"
end
2.times do
  o = Object.new
  ObjectSpace.define_finalizer(o, fzer)
end

class MyError < RuntimeError; end
begin
  main_th = Thread.current
  Thread.new do
    sleep 0.1
    main_th.raise(MyError)
  end
  GC.start
  puts "After GC"
  sleep
rescue MyError
  puts "Rescued #$!"
end

The GC can fail to destroy the object. The C compiler that compiled Ruby might leave an unused VALUE on the stack, so the GC marks the object and doesn't destroy it. To work around this, I make 2 objects in a loop, and hope that the GC can destroy at least one of them.

To reproduce the bug, I use GC.disable and GC.start to simulate some bad luck and run the finalizer at a bad time. My main thread gets interrupted with MyError after 0.1 seconds, but the finalizer takes 0.2 seconds to run. So MyError is almost certain to interrupt the finalizer. The bug happens as MyError never reaches my main code. I expect my main thread to rescue MyError after 0.1 seconds, but it never happens, so my main thread sleeps forever.

$ ruby -v
ruby 2.5.0dev (2017-09-11 trunk 59840) [x86_64-openbsd6.1]
$ ruby -d my-error-13876.rb                                                    
Exception `LoadError' at /home/kernigh/prefix/lib/ruby/2.5.0/rubygems.rb:1346 - cannot load such file -- rubygems/defaults/operating_system
Exception `LoadError' at /home/kernigh/prefix/lib/ruby/2.5.0/rubygems.rb:1355 - cannot load such file -- rubygems/defaults/ruby
Exception `Gem::MissingSpecError' at /home/kernigh/prefix/lib/ruby/2.5.0/rubygems/dependency.rb:308 - Gem::MissingSpecError
Before sleep 13867240187460
Exception `MyError' at my-error-13876.rb:4 - MyError
After GC
<HANGS>

MyError must have interrupted my finalizer, because it never printed the message After sleep 13867240187460.

Updated by jeremyevans0 (Jeremy Evans) 4 months ago

This is still a problem in the master branch. Possible solutions:

  1. Run finalizers in a separate thread instead of the main thread.
  2. Mark when thread is running finalizers. Mark exceptions added by Thread#raise. When a thread is checking pending interrupts, if it is running finalizers, ignore interrupts added by Thread#raise, so that the delivery of the exceptions is delayed until the thread is no longer running finalizers.
  3. Mark when thread is running finalizers. Do not check for pending interrupts when running finalizers.
  4. Remove Thread#raise.

Option 4 is the simplest to implement, but it is not backwards compatible and I don't think it would be acceptable.

I've a pull request for option 3, since it was fairly simple to implement: https://github.com/ruby/ruby/pull/4366

I think that the option 1 would be the most robust approach, but it requires an extra thread.

Updated by byroot (Jean Boussier) 4 months ago

Run finalizers in a separate thread instead of the main thread.

This could cause backward compatibility issues as well. Finalizers might reach to thread local state & such.

I agree that Option 3 is the best.

Actions #4

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

  • Has duplicate Bug #15507: Thread#raise is delivered to GC context added

Updated by mame (Yusuke Endoh) 2 months ago

I'm neutral for this proposal, but just like to add two comments.

  1. The proposed change will break a program that uses Timeout.timeout in a finalizer. Such a program may get stuck. I don't think that it is a good program, though.
  2. The patch adds an ad-hoc field to disable interrupt, but it may be good to reuse the mechanism of handle_interrupt.

Updated by Eregon (Benoit Daloze) 2 months ago

Option 1 would be useful, might avoid delaying finalizers if the main thread is for a while in native code, and it is already what JRuby (IIRC) and TruffleRuby currently do. There is potentially some incompatibility, but it seems pretty low given AFAIK we've seen no issue with that approach on JRuby & TruffleRuby (headius (Charles Nutter) might know some?).

Option 3 seems a good way too. Related: https://bugs.ruby-lang.org/issues/17849#note-16

Actions #7

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Feature #17849: Fix Timeout.timeout so that it can be used in threaded Web servers added
Actions #8

Updated by jeremyevans (Jeremy Evans) 5 days ago

  • Status changed from Open to Closed

Applied in changeset git|87b327efe6c1f456c43b3f105c7a92a5a7effe93.


Do not check pending interrupts when running finalizers

This fixes cases where exceptions raised using Thread#raise are
swallowed by finalizers and not delivered to the running thread.

This could cause issues with finalizers that rely on pending interrupts,
but that case is expected to be rarer.

Fixes [Bug #13876]
Fixes [Bug #15507]

Co-authored-by: Koichi Sasada ko1@atdot.net

Actions

Also available in: Atom PDF