Project

General

Profile

Feature #14859

[PATCH] implement Timeout in VM

Added by normalperson (Eric Wong) 27 days ago. Updated 1 day ago.

Status:
Open
Priority:
Normal
Target version:
-
[ruby-core:87541]

Description

implement Timeout in VM

Based on the ugliness of handling partial writes with
IO#write_nonblock and inability to use writev(2) effectively
with write timeouts in Net::HTTP in r63587-r63589, I've
decided Timeout to be the more programmer-friendly option
to use and to improve it.

Timeout is significantly faster with this patch, and stopping
the timeout before expiration (common case) is optimized to be
as fast as possible. This version relies on timer-thread to
provide wakeup interrupts.

This is a minimally intrusive patch. I also started working on
a more intrusive patch to touch all sleep/waiting function
calls, but this is easier-to-review for now. In the future,
I will try per-thread timeouts and eliminate timer-thread
for platforms with POSIX timers (timer_create/timer_settime)

Speedup ratio: compare with the result of `trunk' (greater is better)

timeout_mt_nested 3.887
timeout_mt_same 3.843
timeout_mt_ugly 1.335
timeout_nested 7.059
timeout_same 5.173
timeout_zero 2.587

0001-implement-Timeout-in-VM.patch (27.2 KB) 0001-implement-Timeout-in-VM.patch normalperson (Eric Wong), 06/21/2018 12:21 AM

History

#1 [ruby-core:87542] Updated by normalperson (Eric Wong) 27 days ago

https://bugs.ruby-lang.org/issues/14859

I sometimes get failures in test_condvar_wait_deadlock_2 of
test/thread/test_cv.rb with this, but I think that test is too
dependent on thread scheduling timing...

In other words, I'm not sure it's a good test...

#2 [ruby-core:87544] Updated by normalperson (Eric Wong) 27 days ago

https://bugs.ruby-lang.org/issues/14859

I sometimes get failures in test_condvar_wait_deadlock_2 of
test/thread/test_cv.rb with this, but I think that test is too
dependent on thread scheduling timing...

Disregard, will fix...

#3 [ruby-core:87570] Updated by Eregon (Benoit Daloze) 27 days ago

Should lib/timeout.rb be removed then?
Why is it moved to core, could it stay an extension?

Note that there are pure-Ruby implementations of Timeout using a single Ruby Thread, like
https://github.com/oracle/truffleruby/blob/71df1ecc4fd9e318b5bd3998cfaeb85a96de7a8b/lib/truffle/timeout.rb (originally from Rubinius)

and that WEBrick has basically its own version of Timeout, using a single Thread:
https://github.com/ruby/ruby/blob/48efa44719d03eb067d27b30c68cf821074aedce/lib/webrick/utils.rb

#4 [ruby-core:87571] Updated by Eregon (Benoit Daloze) 27 days ago

Something else, I would consider Timeout to be fundamentally flawed as long as it relies on Thread#raise,
because it can fire in the middle of an ensure block:
http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

Maybe IO#write and such should directly accept a timeout argument and other changes to make the API easier to use?

#5 [ruby-core:87582] Updated by normalperson (Eric Wong) 27 days ago

eregontp@gmail.com wrote:

Should lib/timeout.rb be removed then?

Yes, and the rdoc will be moved if accepted.

Why is it moved to core, could it stay an extension?

Right now it needs to hook into the core timer thread without
needing additional threads. I already run into resource
exhaustion problems with timer-thread when testing.

I'm working on making all wait functions aware of it:
rb_wait_for_single_fd, rb_thread_fd_select, rb_thread_sleep*, etc.

Eventually, I also want to get rid of timer thread (for POSIX)
but it might not be easy

Note that there are pure-Ruby implementations of Timeout using a single Ruby Thread, like
https://github.com/oracle/truffleruby/blob/71df1ecc4fd9e318b5bd3998cfaeb85a96de7a8b/lib/truffle/timeout.rb (originally from Rubinius)

Using one extra Thread is already too much for me.

and that WEBrick has basically its own version of Timeout, using a single Thread:
https://github.com/ruby/ruby/blob/48efa44719d03eb067d27b30c68cf821074aedce/lib/webrick/utils.rb

Yes, I want to get rid of that by making Timeout better.

#6 [ruby-core:87583] Updated by normalperson (Eric Wong) 27 days ago

eregontp@gmail.com wrote:

Something else, I would consider Timeout to be fundamentally
flawed as long as it relies on Thread#raise, because it can
fire in the middle of an ensure block:
http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

We have Thread.handle_interrupt, nowadays, to control when
interrupts fire.

Maybe IO#write_nonblock and such should directly accept a
timeout argument and other changes to make the API easier to
use?

I considered that, too, but we'd need to add timeouts to every
single method which can block. There are many: File.open, Queue#pop,
SizedQueue#push, Mutex#lock/synchronize, Process.wait*,
IO#gets, IO#write, IO#read, IO#getc, IO.copy_stream, ...

Pretty much every IO method needs to be changed and callers
need to be rewritten.

Things like File.open and Process.wait* don't have timeouts
in the underlying syscall, so we'd still have to use a timer
thread or POSIX timers to interrupt them on timeout.

The goal is to make all those methods aware of Timeout
without changing existing user code at all.

#7 [ruby-core:87587] Updated by normalperson (Eric Wong) 26 days ago

eregontp@gmail.com wrote:

Should lib/timeout.rb be removed then?

Also, I might keep compatibility code in timeout.rb and stop
providing it. This avoids bloating the VM with deprecated
stuff without breaking backwards compatibility.

#8 [ruby-core:87588] Updated by normalperson (Eric Wong) 26 days ago

stop providing it.

I meant: stop using rb_provide("timeout.rb")

#9 [ruby-core:87598] Updated by Eregon (Benoit Daloze) 26 days ago

normalperson (Eric Wong) wrote:

eregontp@gmail.com wrote:

Something else, I would consider Timeout to be fundamentally
flawed as long as it relies on Thread#raise, because it can
fire in the middle of an ensure block:
http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

We have Thread.handle_interrupt, nowadays, to control when
interrupts fire.

Right, although it's very difficult to use correctly (for instance, it's incorrect to use Thread.handle_interrupt inside the ensure block) and can easily cause hangs or deadlocks.

BTW, it looks like MonitorMixin::ConditionVariable doesn't use Thread.handle_interrupt and could continue out of #wait (with an exception thrown by Thread#raise) without reacquiring the lock.

It might be nice to have a Timeout variant that only interrupts blocking IO, without relying on Thread#raise (but just SIGVTALRM).
I think that would be easier/safer to use than the current Timeout.timeout().
Not sure how to deal if there are multiple IO calls inside that Timeout block though.
And there could still be blocking IO in an ensure block, which would not work as intended.

I considered that, too, but we'd need to add timeouts to every
single method which can block. There are many: File.open, Queue#pop,
SizedQueue#push, Mutex#lock/synchronize, Process.wait*,
IO#gets, IO#write, IO#read, IO#getc, IO.copy_stream, ...

It seems fine to me. Other implementations already have a timeout on Queue#pop IIRC.
I'm not sure we need all of them right now (what use case for Mutex#lock/synchronize ?).
I think the main need would be for standard IO like IO#read/write, especially on sockets and pipes.

#10 [ruby-core:87599] Updated by normalperson (Eric Wong) 26 days ago

eregontp@gmail.com wrote:

normalperson (Eric Wong) wrote:

eregontp@gmail.com wrote:

Something else, I would consider Timeout to be fundamentally
flawed as long as it relies on Thread#raise, because it can
fire in the middle of an ensure block:
http://headius.blogspot.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

We have Thread.handle_interrupt, nowadays, to control when
interrupts fire.

Right, although it's very difficult to use correctly (for
instance, it's incorrect to use Thread.handle_interrupt inside
the ensure block) and can easily cause hangs or deadlocks.

Agreed, it should be easier-to-use; but that's a separate issue
and I'm trying to avoid dealing with public API design as much
as possible for this.

BTW, it looks like MonitorMixin::ConditionVariable doesn't use
Thread.handle_interrupt and could continue out of #wait (with
an exception thrown by Thread#raise) without reacquiring the
lock.

Can you report separately to shugo to be sure he sees it?
I've never really understood the point of that module :x

It might be nice to have a Timeout variant that only
interrupts blocking IO, without relying on Thread#raise (but
just SIGVTALRM). I think that would be easier/safer to use
than the current Timeout.timeout(). Not sure how to deal if
there are multiple IO calls inside that Timeout block though.
And there could still be blocking IO in an ensure block, which
would not work as intended.

Agreed, I would welcome a "soft" timeout, without using
signals/raise at all. I think my work-in-progress "intrusive"
[PATCH 2/1] for this will make such a thing easier-to-implement:

https://80x24.org/spew/20180622215745.20698-1-e@80x24.org/raw

I considered that, too, but we'd need to add timeouts to
every
single method which can block. There are many: File.open,
Queue#pop, SizedQueue#push, Mutex#lock/synchronize,
Process.wait*, IO#gets, IO#write, IO#read, IO#getc,
IO.copy_stream, ...

It seems fine to me. Other implementations already have a
timeout on Queue#pop IIRC. I'm not sure we need all of them
right now (what use case for Mutex#lock/synchronize ?). I
think the main need would be for standard IO like
IO#read/write, especially on sockets and pipes.

The maintenance overhead for adding timeouts to every call would
be overwhelming on a human level, especially when 3rd-party
libraries need to be considered.

I would much rather do the following:

    Timeout.timeout(30) do
      foo.read(...)
      foo.write(...)
      IO.copy_stream(...)
      foo.write(...)
        szqueue.push(...)
        resultq.pop
 end

Than this:

 def now
      Process.clock_gettime(Process::CLOCK_MONOTONIC)
    end

 begin
        @stop = now + 30
        ...
        tout = @stop - now
        raise Timeout::Error if tout <= 0
        foo.read(..., tout)

        tout = @stop - now
        raise Timeout::Error if tout <= 0
        foo.write(..., tout)

        tout = @stop - now
        raise Timeout::Error if tout <= 0
        IO.copy_stream(..., tout)

        tout = @stop - now
        raise Timeout::Error if tout <= 0
        foo.write(..., tout)

        tout = @stop - now
        raise Timeout::Error if tout <= 0
        szqueue.push(..., tout)

        tout = @stop - now
        raise Timeout::Error if tout <= 0
        resultq.pop(tout)
    end

#11 [ruby-core:87968] Updated by normalperson (Eric Wong) 1 day ago

Feature #14859: [PATCH] implement Timeout in VM
https://bugs.ruby-lang.org/issues/14859

Note: I'm still working on this. Feature #13618
basically has an implementation of timeouts in the VM, too,
because of timeout args in rb_io_wait_*able.

I would rather implement this feature first without making
visible API changes for #13618.

Also available in: Atom PDF