Feature #5446

at_fork callback API

Added by Eric Wong over 2 years ago. Updated 4 months ago.

Status:Assigned
Priority:Low
Assignee:Motohiro KOSAKI
Category:-
Target version:next minor

Description

It would be good if Ruby provides an API for registering fork() handlers.

This allows libraries to automatically and agnostically reinitialize resources
such as open IO objects in child processes whenever fork() is called by a user
application. Use of this API by library authors will reduce false/improper
sharing of objects across processes when interacting with other
libraries/applications that may fork.

This Ruby API should function similarly to pthread_atfork() which allows
(at least) three different callbacks to be registered:

1) prepare - called before fork() in the original process
2) parent - called after fork() in the original process
3) child - called after fork() in the child process

It should be possible to register multiple callbacks for each action
(like atexit and pthreadatfork(3)).

These callbacks should be called whenever fork() is used:

  • Kernel#fork
  • IO.popen
  • ``
  • Kernel#system

... And any other APIs I've forgotten about

I also want to consider handlers that only need to be called for plain
fork() use (without immediate exec() afterwards, like with `` and system()).

Ruby already has the internal support for most of this this to manage mutexes,
Thread structures, and RNG seed. Currently, no external API is exposed. I can
prepare a patch if an API is decided upon.

History

#1 Updated by Motohiro KOSAKI over 2 years ago

As you know, we can only call asynchronous-signal-safe function between fork and exec when the process is multi threaded.
but ruby code invocation definitely need to use malloc which not async-signal-safe. so, it's pretty hard to implement.

Am I missing something?

#2 Updated by Yui NARUSE over 2 years ago

  • Project changed from CommonRuby to ruby-trunk
  • Target version deleted (next minor)

#3 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Feedback

Eric Wong,

Do you still want this feature?
If so, could you answer kosaki's comment?

OT: We noticed and surprised at your ID (normalperson) at the recent
developers' meeting in Akihabara. Clearly, you are greatperson :-)

Yusuke Endoh mame@tsg.ne.jp

#4 Updated by Eric Wong about 2 years ago

"mame (Yusuke Endoh)" mame@tsg.ne.jp wrote:

Do you still want this feature?

Yes, but lower priority.

I think default to IO#closeonexec=true for 2.0.0 makes this
less important.

If so, could you answer kosaki's comment?

kosaki wrote:

As you know, we can only call asynchronous-signal-safe function
between fork and exec when the process is multi threaded. but ruby
code invocation definitely need to use malloc which not
async-signal-safe. so, it's pretty hard to implement. Am I missing
something?

I can't edit the existing ticket, but I think only Kernel#fork should
be touched. Methods that call exec after fork will already get things
cleaned up from closeonexec.

If there is a Ruby call to exec, then we already have a chance to use
non-async-signal safe code.

It could be implemented in pure Ruby, even. This is a prototype (using
xfork name) intead:

ATFORK = {
  :prepare => [ lambda { puts ":prepare in #$$" } ],
  :parent => [ lambda { puts ":parent in #$$" } ],
  :child => [ lambda { puts ":child in #$$" } ],
}

def xfork
  ATFORK[:prepare].each { |code| code.call }
  if block_given?
    pid = fork do
      ATFORK[:child].each { |code| code.call }
      yield
    end
    ATFORK[:parent].each { |code| code.call }
  else
    case pid = fork
    when nil
      ATFORK[:child].each { |code| code.call }
    when Integer
      ATFORK[:parent].each { |code| code.call }
    end
  end

  pid
end

I haven't thought of an API to manipulate the ATFORK arrays
with. I don't want to emulate pthread_atfork() directly, it's
too cumbersome for Ruby. Perhaps:

at_fork(:prepare) { ... }
at_fork(:child) { ... }
at_fork(:parent) { ... }

OT: We noticed and surprised at your ID (normalperson) at the recent
developers' meeting in Akihabara. Clearly, you are greatperson :-)

I don't think of myself as great. But if others think I'm great,
they should try to be like me, then we'll all be normal :)

#5 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Feedback to Assigned
  • Assignee set to Motohiro KOSAKI
  • Target version set to next minor

#6 Updated by Sam Saffron 5 months ago

This is a critical feature for Ruby imho, at the moment there are 100 mechanisms for at_fork, we need a clean, supported ordered one.

I think there should be strong parity with at_exit, so am not particularly fond of the symbol param. I would like.

atfork{ } # returns proc, a stack of procs that are called in order just before fork
after
fork {} # returns a proc, a stack of procs that are called in order just after child process launches

The prepare thing for me seems like overkill. It would be like we are implementing 2 queues, one urgent, one less urgent. I don't really see the point.

There is a question of cancelling forks (eg: what happens when an exception is thrown during these callbacks?)

#7 Updated by Jason Clark 5 months ago

I'd love to see this added. Gems using threads (like newrelic_rpm) have a lot of potential for deadlocks when forking happens. This would gives a nice mechanism for dealing with those issues more generally, rather than having to hook things gem-by-gem like we do today.

New Relic + Resque has seen a lot of these types of problems, some of which are documented at https://github.com/resque/resque/issues/1101. While Resque has gem-specific hooks we lean on, having those hooks be at the Ruby level instead would be awesome.

Is there any possibility of this type of after_fork hook could also apply to daemonizing with Process.daemon? While we have fewer deadlocks, we often lose visibility after processes daemonize because we don't know to start our threads back up in the daemonized process. (See https://github.com/puma/puma/issues/335 for an example with the Puma web server).

#8 Updated by Aman Gupta 5 months ago

Here is another example of a gem which implement its own fork hooks: https://github.com/zk-ruby/zk/blob/master/lib/zk/fork_hook.rb

#9 Updated by Aman Gupta 5 months ago

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

#10 Updated by Benoit Daloze 5 months ago

tmm1 (Aman Gupta) wrote:

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

Sounds good and useful to me!

#11 Updated by Eric Wong 5 months ago

"tmm1 (Aman Gupta)" ruby@tmm1.net wrote:

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

I think there needs to be separate hooks for prepare/parent/child
(like pthread_atfork(3)). The parent may need to release resources
before forking (prepare hook), and perhaps reacquire/reinitialize
them after forking (parent hook).

The prepare hook is important for things like DB connections;
the parent hook might be less useful (especially for apps which
fork repeatedly).

#12 Updated by Aman Gupta 5 months ago

I'd like to add a beforefork{} hook that fires in the parent before the fork. An afterfork hook in the parent seems unnecessary.

#13 Updated by Motohiro KOSAKI 4 months ago

2013/11/30 tmm1 (Aman Gupta) ruby@tmm1.net:

Issue #5446 has been updated by tmm1 (Aman Gupta).

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

??!?!?

  1. Why no before_fork?
  2. zk has :afterchild nad :afterparent hook and your patch don't. Why?
  3. Why should the new method return proc?
  4. When rbdaemon() is used, some platform call afterfork once and the other call twice. It seems useless.
  5. Why do hook fire at rbthreadatfork() make a lot of new array?
  6. Your patch doesn't aim to remove the hooks and I'm sure it is required.

    You said the new method should be created for killing the gem specific hooks.
    But your patch seems not to be able to.

#14 Updated by Aman Gupta 4 months ago

Thanks for your feedback.

  1. Why no before_fork?

I planned to add this in the same way as afterfork, as long as there are no issues with my patch.
I wasn't sure if rb
vm_t is the best place for the new hooks.
If it seems sane, I can expand the patch.

  1. zk has :afterchild nad :afterparent hook and your patch don't. Why?

With two hooks, beforefork/afterfork methods are enough. If we want to implement three hooks, should we add three new methods?

  1. Why should the new method return proc?

I guessed the return value can be used to de-register the hook later. But there is no way to do this currently.

  1. When rbdaemon() is used, some platform call afterfork once and the other call twice. It seems useless.

Ah, good point. Maybe we need a flag to ensure only one invocation.

  1. Why do hook fire at rbthreadatfork() make a lot of new array?

No reason. Patch was a simple proof of concept for feedback.

  1. Your patch doesn't aim to remove the hooks and I'm sure it is required.

Do you have any API ideas for this? I agree we need some way to remove the hooks.

One option is to use tracepoint api:

tp = TracePoint.new(:beforefork, :afterfork_child){ ... }
tp.enable
tp.disable

#15 Updated by Eric Wong 4 months ago

Eric Wong normalperson@yhbt.net wrote:

the parent hook might be less useful (especially for apps which
fork repeatedly).

I take that back. It can be useful for for setting up pipes/sockets for IPC
between parent and child.

Example without wrapper class and explicit IO#close:

 r, w = IO.pipe
 pid = fork do
   w.close # could be atfork_child
   ...
 end
 r.close # could be atfork_parent
 ...

However, I want to do this via callback, example with Worker class:

 class Worker
   attr_writer :pid

   def initialize
     @r, @w = IO.pipe
     Process.atfork_parent { @r.close unless @r.closed? }
     Process.atfork_child { @w.close unless @w.closed? }
   end
 end

 worker = Worker.new # IO.pipe
 worker.pid = fork { ... }
 ...

 # No need to remember what to close in parent/child

#16 Updated by Aman Gupta 4 months ago

Ok good point. I agree we should add all three if we're going to do this.

I like the Process.atfork_parent{} API in your example. If we add three methods to Process, the only thing missing is a way to un-register a hook.

#17 Updated by Aman Gupta 4 months ago

Another idea:

Process.atfork{ |loc| case loc; when :beforefork; ... end } -> proc
Process.removeatfork(proc)

#18 Updated by Akira Tanaka 4 months ago

2013/12/7 Eric Wong normalperson@yhbt.net:

However, I want to do this via callback, example with Worker class:

class Worker
  attr_writer :pid

  def initialize
    @r, @w = IO.pipe
    Process.atfork_parent { @r.close unless @r.closed? }
    Process.atfork_child { @w.close unless @w.closed? }
  end
end

worker = Worker.new # IO.pipe
worker.pid = fork { ... }
...

# No need to remember what to close in parent/child

I think it doesn't work with multiple thread.

2.times {
Thread.new {
worker = Worker.new # IO.pipe
worker.pid = fork { ... }
...
}
}

If fork for worker 1 is called between IO.pipe and fork for worker 2,
pipes for worker 2 is leaked for the process for worker 1 and
not inherited to the process for worker 2.

I feel it is not a good example for this issue.
--
Tanaka Akira

#19 Updated by Eric Wong 4 months ago

Tanaka Akira akr@fsij.org wrote:

2013/12/7 Eric Wong normalperson@yhbt.net:

However, I want to do this via callback, example with Worker class:

class Worker
  attr_writer :pid

  def initialize
    @r, @w = IO.pipe
    Process.atfork_parent { @r.close unless @r.closed? }
    Process.atfork_child { @w.close unless @w.closed? }
  end
end

worker = Worker.new # IO.pipe
worker.pid = fork { ... }
...

# No need to remember what to close in parent/child

I think it doesn't work with multiple thread.

True, but I wasn't intending this example to be used for an MT
application, but a single-threaded, multi-process HTTP server.

Generally, I do not use fork after I've spawned threads (unless
followed immediately with exec).

Also available in: Atom PDF