Project

General

Profile

Actions

Feature #5446

closed

at_fork callback API

Added by normalperson (Eric Wong) almost 13 years ago. Updated over 3 years ago.

Status:
Closed
Target version:
-

Description

It would be good if Ruby provides an API for registering fork() handlers.

This allows libraries to automatically and agnostically reinitialize resources
such as open IO objects in child processes whenever fork() is called by a user
application. Use of this API by library authors will reduce false/improper
sharing of objects across processes when interacting with other
libraries/applications that may fork.

This Ruby API should function similarly to pthread_atfork() which allows
(at least) three different callbacks to be registered:

  1. prepare - called before fork() in the original process
  2. parent - called after fork() in the original process
  3. child - called after fork() in the child process

It should be possible to register multiple callbacks for each action
(like at_exit and pthread_atfork(3)).

These callbacks should be called whenever fork() is used:

  • Kernel#fork
  • IO.popen
  • ``
  • Kernel#system

... And any other APIs I've forgotten about

I also want to consider handlers that only need to be called for plain
fork() use (without immediate exec() afterwards, like with `` and system()).

Ruby already has the internal support for most of this this to manage mutexes,
Thread structures, and RNG seed. Currently, no external API is exposed. I can
prepare a patch if an API is decided upon.


Related issues 2 (0 open2 closed)

Related to Ruby master - Bug #14009: macOS High Sierra and “fork” compatibilityClosedActions
Related to Ruby master - Feature #17795: Around `Process.fork` callbacks APIClosedActions
Actions #1

Updated by kosaki (Motohiro KOSAKI) almost 13 years ago

As you know, we can only call asynchronous-signal-safe function between fork and exec when the process is multi threaded.
but ruby code invocation definitely need to use malloc which not async-signal-safe. so, it's pretty hard to implement.

Am I missing something?

Actions #2

Updated by naruse (Yui NARUSE) almost 13 years ago

  • Project changed from 14 to Ruby master
  • Target version deleted (2.6)

Updated by mame (Yusuke Endoh) over 12 years ago

  • Status changed from Open to Feedback

Eric Wong,

Do you still want this feature?
If so, could you answer kosaki's comment?

OT: We noticed and surprised at your ID (normalperson) at the recent
developers' meeting in Akihabara. Clearly, you are greatperson :-)

--
Yusuke Endoh

Updated by normalperson (Eric Wong) over 12 years ago

"mame (Yusuke Endoh)" wrote:

Do you still want this feature?

Yes, but lower priority.

I think default to IO#close_on_exec=true for 2.0.0 makes this
less important.

If so, could you answer kosaki's comment?

kosaki wrote:

As you know, we can only call asynchronous-signal-safe function
between fork and exec when the process is multi threaded. but ruby
code invocation definitely need to use malloc which not
async-signal-safe. so, it's pretty hard to implement. Am I missing
something?

I can't edit the existing ticket, but I think only Kernel#fork should
be touched. Methods that call exec after fork will already get things
cleaned up from close_on_exec.

If there is a Ruby call to exec, then we already have a chance to use
non-async-signal safe code.

It could be implemented in pure Ruby, even. This is a prototype (using
xfork name) intead:

ATFORK = {
  :prepare => [ lambda { puts ":prepare in #$$" } ],
  :parent => [ lambda { puts ":parent in #$$" } ],
  :child => [ lambda { puts ":child in #$$" } ],
}

def xfork
  ATFORK[:prepare].each { |code| code.call }
  if block_given?
    pid = fork do
      ATFORK[:child].each { |code| code.call }
      yield
    end
    ATFORK[:parent].each { |code| code.call }
  else
    case pid = fork
    when nil
      ATFORK[:child].each { |code| code.call }
    when Integer
      ATFORK[:parent].each { |code| code.call }
    end
  end

  pid
end

I haven't thought of an API to manipulate the ATFORK arrays
with. I don't want to emulate pthread_atfork() directly, it's
too cumbersome for Ruby. Perhaps:

at_fork(:prepare) { ... }
at_fork(:child) { ... }
at_fork(:parent) { ... }

OT: We noticed and surprised at your ID (normalperson) at the recent
developers' meeting in Akihabara. Clearly, you are greatperson :-)

I don't think of myself as great. But if others think I'm great,
they should try to be like me, then we'll all be normal :)

Actions #5

Updated by mame (Yusuke Endoh) almost 12 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to kosaki (Motohiro KOSAKI)
  • Target version set to 2.6

Updated by sam.saffron (Sam Saffron) almost 11 years ago

This is a critical feature for Ruby imho, at the moment there are 100 mechanisms for at_fork, we need a clean, supported ordered one.

I think there should be strong parity with at_exit, so am not particularly fond of the symbol param. I would like.

at_fork{ } # returns proc, a stack of procs that are called in order just before fork
after_fork {} # returns a proc, a stack of procs that are called in order just after child process launches

The prepare thing for me seems like overkill. It would be like we are implementing 2 queues, one urgent, one less urgent. I don't really see the point.

--

There is a question of cancelling forks (eg: what happens when an exception is thrown during these callbacks?)

Updated by jasonrclark (Jason Clark) almost 11 years ago

I'd love to see this added. Gems using threads (like newrelic_rpm) have a lot of potential for deadlocks when forking happens. This would gives a nice mechanism for dealing with those issues more generally, rather than having to hook things gem-by-gem like we do today.

New Relic + Resque has seen a lot of these types of problems, some of which are documented at https://github.com/resque/resque/issues/1101. While Resque has gem-specific hooks we lean on, having those hooks be at the Ruby level instead would be awesome.

Is there any possibility of this type of after_fork hook could also apply to daemonizing with Process.daemon? While we have fewer deadlocks, we often lose visibility after processes daemonize because we don't know to start our threads back up in the daemonized process. (See https://github.com/puma/puma/issues/335 for an example with the Puma web server).

Updated by tmm1 (Aman Karmani) almost 11 years ago

Here is another example of a gem which implement its own fork hooks: https://github.com/zk-ruby/zk/blob/master/lib/zk/fork_hook.rb

Updated by Eregon (Benoit Daloze) almost 11 years ago

tmm1 (Aman Gupta) wrote:

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

Sounds good and useful to me!

Updated by normalperson (Eric Wong) almost 11 years ago

"tmm1 (Aman Gupta)" wrote:

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

I think there needs to be separate hooks for prepare/parent/child
(like pthread_atfork(3)). The parent may need to release resources
before forking (prepare hook), and perhaps reacquire/reinitialize
them after forking (parent hook).

The prepare hook is important for things like DB connections;
the parent hook might be less useful (especially for apps which
fork repeatedly).

Updated by tmm1 (Aman Karmani) almost 11 years ago

I'd like to add a before_fork{} hook that fires in the parent before the fork. An after_fork hook in the parent seems unnecessary.

Updated by kosaki (Motohiro KOSAKI) almost 11 years ago

2013/11/30 tmm1 (Aman Gupta) :

Issue #5446 has been updated by tmm1 (Aman Gupta).

Simple implementation of an after_fork{} hook: https://github.com/tmm1/ruby/commit/711a68b6599d176c5bcb4ef0c90aa195a290d1c0

Any objection?

??!?!?

  1. Why no before_fork?
  2. zk has :after_child nad :after_parent hook and your patch don't. Why?
  3. Why should the new method return proc?
  4. When rb_daemon() is used, some platform call after_fork once and
    the other call twice. It seems useless.
  5. Why do hook fire at rb_thread_atfork() make a lot of new array?
  6. Your patch doesn't aim to remove the hooks and I'm sure it is required.

You said the new method should be created for killing the gem specific hooks.
But your patch seems not to be able to.

Updated by tmm1 (Aman Karmani) almost 11 years ago

Thanks for your feedback.

  1. Why no before_fork?

I planned to add this in the same way as after_fork, as long as there are no issues with my patch.
I wasn't sure if rb_vm_t is the best place for the new hooks.
If it seems sane, I can expand the patch.

  1. zk has :after_child nad :after_parent hook and your patch don't. Why?

With two hooks, before_fork/after_fork methods are enough. If we want to implement three hooks, should we add three new methods?

  1. Why should the new method return proc?

I guessed the return value can be used to de-register the hook later. But there is no way to do this currently.

  1. When rb_daemon() is used, some platform call after_fork once and
    the other call twice. It seems useless.

Ah, good point. Maybe we need a flag to ensure only one invocation.

  1. Why do hook fire at rb_thread_atfork() make a lot of new array?

No reason. Patch was a simple proof of concept for feedback.

  1. Your patch doesn't aim to remove the hooks and I'm sure it is required.

Do you have any API ideas for this? I agree we need some way to remove the hooks.

One option is to use tracepoint api:

tp = TracePoint.new(:before_fork, :after_fork_child){ ... }
tp.enable
tp.disable

Updated by normalperson (Eric Wong) almost 11 years ago

Eric Wong wrote:

the parent hook might be less useful (especially for apps which
fork repeatedly).

I take that back. It can be useful for for setting up pipes/sockets for IPC
between parent and child.

Example without wrapper class and explicit IO#close:

 r, w = IO.pipe
 pid = fork do
   w.close # could be atfork_child
   ...
 end
 r.close # could be atfork_parent
 ...

However, I want to do this via callback, example with Worker class:

 class Worker
   attr_writer :pid

   def initialize
     @r, @w = IO.pipe
     Process.atfork_parent { @r.close unless @r.closed? }
     Process.atfork_child { @w.close unless @w.closed? }
   end
 end

 worker = Worker.new # IO.pipe
 worker.pid = fork { ... }
 ...

 # No need to remember what to close in parent/child

Updated by tmm1 (Aman Karmani) almost 11 years ago

Ok good point. I agree we should add all three if we're going to do this.

I like the Process.atfork_parent{} API in your example. If we add three methods to Process, the only thing missing is a way to un-register a hook.

Updated by tmm1 (Aman Karmani) almost 11 years ago

Another idea:

Process.at_fork{ |loc| case loc; when :before_fork; ... end } -> proc
Process.remove_at_fork(proc)

Updated by akr (Akira Tanaka) almost 11 years ago

2013/12/7 Eric Wong :

However, I want to do this via callback, example with Worker class:

class Worker
  attr_writer :pid

  def initialize
    @r, @w = IO.pipe
    Process.atfork_parent { @r.close unless @r.closed? }
    Process.atfork_child { @w.close unless @w.closed? }
  end
end

worker = Worker.new # IO.pipe
worker.pid = fork { ... }
...

# No need to remember what to close in parent/child

I think it doesn't work with multiple thread.

2.times {
  Thread.new {
    worker = Worker.new # IO.pipe
    worker.pid = fork { ... }
    ...
  }
}

If fork for worker 1 is called between IO.pipe and fork for worker 2,
pipes for worker 2 is leaked for the process for worker 1 and
not inherited to the process for worker 2.

I feel it is not a good example for this issue.

Tanaka Akira

Updated by normalperson (Eric Wong) almost 11 years ago

Tanaka Akira wrote:

2013/12/7 Eric Wong :

However, I want to do this via callback, example with Worker class:

class Worker
  attr_writer :pid

  def initialize
    @r, @w = IO.pipe
    Process.atfork_parent { @r.close unless @r.closed? }
    Process.atfork_child { @w.close unless @w.closed? }
  end
end

worker = Worker.new # IO.pipe
worker.pid = fork { ... }
...

# No need to remember what to close in parent/child

I think it doesn't work with multiple thread.

True, but I wasn't intending this example to be used for an MT
application, but a single-threaded, multi-process HTTP server.

Generally, I do not use fork after I've spawned threads (unless
followed immediately with exec).

Updated by sam.saffron (Sam Saffron) almost 8 years ago

ping, any plans to add these apis, they will help clean up a lot in the Ruby ecosystem, just hit it again today.

I think what is really needed here is a conceptual Yes or No from Koichi or Matz

Updated by pitr.ch (Petr Chalupa) almost 8 years ago

We had to get around missing at_fork in concurrent-ruby as well. Since we are avoiding monkey_patching we had to result in lazy-checking pid, when it changes we re-create a resource. The resource has to check on every usage though which is not good.

Having Process.remove_at_fork(proc) is critical, an user of at_fork has to be able to unregister callback when the resource is not used any further.

Updated by shyouhei (Shyouhei Urabe) over 7 years ago

We looked at this issue at today's developer meeting.

I was told that async-signal-safety is no longer the issue because in comment #4 the OP already restricted the request to fork method that runs ruby code. That's a good thing to know.

Then, given repeated request of this functionality, why not start this feature as a gem? Because we focus on fork that runs ruby code, the needed functionality is doable in pure ruby I think. Is there any reason this should be done by the language core (apart from monkey-paytching ugliness)?

Updated by Eregon (Benoit Daloze) over 7 years ago

Is there any reason this should be done by the language core (apart from monkey-paytching ugliness)?

Yes, Ruby code cannot change e.g. rb_f_fork().
There is already atfork-related logic and it seems brittle to override something as essential as #fork.
Since libraries which need this feature currently monkey-patch #fork, it means they do not compose well.
Monkey-patching is considered very fragile for this, I guess that's why no gem for it exists or it would not be used.

If monkey-patching is not used (because of above problems),
the cost of checking on every access becomes a problem for maintenance and performance.
Sometimes, it even means a library decides to be thread-safe but not process-safe:
https://github.com/jeremyevans/sequel/issues/691

Another example is Passenger and other web servers which have their own hook for after_fork.
This means the application is now forced to keep the same web server, or change the hook (very easy to forget).

Let's add Process.at_fork and Process.remove_at_fork,
to encourage more responsible programming.
The evidence for the need of a common API for this is obvious.

Actions #24

Updated by shyouhei (Shyouhei Urabe) almost 7 years ago

  • Related to Bug #14009: macOS High Sierra and “fork” compatibility added

Updated by kivikakk (Amelia Cuss) about 6 years ago

Are we still interested in pursuing something like this? I'd be happy to push it forward.

Updated by normalperson (Eric Wong) about 6 years ago

wrote:

Are we still interested in pursuing something like this? I'd
be happy to push it forward.

I'm not sure it's necessary, anymore. Most programs and libs
already have workarounds at this point. pthread_atfork can also
be the source of surprising behaviors (thread/async-signal
safety), so Austin group proposed _Fork function:
http://austingroupbugs.net/view.php?id=62

https://bugs.ruby-lang.org/issues/5446#change-73019

These callbacks should be called whenever fork() is used:

  • IO.popen
  • ``
  • Kernel#system

Furthermore, many platforms have vfork (it's possible to support
vfork without MMU) and we favor using that for process spawning,
so the following is unneccessary:

I also want to consider handlers that only need to be called for plain
fork() use (without immediate exec() afterwards, like with `` and system()).

Updated by Eregon (Benoit Daloze) about 6 years ago

IMHO, this is is still very much needed for cases using plain fork().
The default close-on-exec behavior is irrelevant for fork() without exec() and does not help.
Inheriting, e.g., a database connection in such a case is extremely harmful (data corruption or leaking sensitive data) and a common pitfall.
Having a nice at_fork hook would allow libraries to safely disconnect the database in each forked process to avoid accidental file descriptor sharing between forks.

Sequel:
https://github.com/jeremyevans/sequel/issues/691

Passenger has PhusionPassenger.on_event(:starting_worker_process).
They give the example of memcached and by default use it to reestablish ActiveRecord connections:
https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/#unintentional-file-descriptor-sharing

Puma has before_fork:
https://github.com/puma/puma/pull/754/files

Unicorn implements its own before_fork/after_fork:
https://www.rubydoc.info/gems/unicorn/5.0.1/Unicorn%2FConfigurator:after_fork
@normalperson (Eric Wong) Don't you think this deserves to be in core?

I would guess all of these do not work if rb_fork() is called.
What more motivation do we need?
Wait that a major incident happen due to unintentional fd sharing because database libraries cannot protect themselves from fork()?

Updated by normalperson (Eric Wong) about 6 years ago

wrote:

IMHO, this is is still very much needed for cases using plain fork().
The default close-on-exec behavior is irrelevant for fork() without exec() and does not help.

Agreed.

Inheriting, e.g., a database connection in such a case is
extremely harmful (data corruption or leaking sensitive data)
and a common pitfall.

It's been a known problem for decades, now (at least since the
days of mod_perl + DBI on Apache 1.x); and AFAIK there's no data
leaks from it. Anybody who makes that mistake would likely
raise/crash/burn long before seeing, much less leaking sensitive
data.

Having a nice at_fork hook would allow to safely disconnect the database in each forked process to avoid accidental file descriptor sharing between forks.

Sequel:
https://github.com/jeremyevans/sequel/issues/691

I agree with Jeremy on this; it will likely cause new problems
and surprises if libraries use it.

Passenger has PhusionPassenger.on_event(:starting_worker_process).
They give the example of memcached and by default use it to reestablish ActiveRecord connections:
https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/#unintentional-file-descriptor-sharing

Puma has before_fork:
https://github.com/puma/puma/pull/754/files

Exactly my point, everything relevant which forks has documented
and supported means to disconnect/restart connections.

@normalperson (Eric Wong) Don't you think this deserves to be in core?

Not anymore.

I would guess all of these do not work if rb_fork() is called.
What more motivation do we need?
Wait that a major incident happen due to unintentional fd
sharing because database libraries cannot protect themselves
from fork()?

Again, this is a known problem with known workarounds for
decades, now. I expect nobody has been able to get close to
production or dealing with important data with such a broken
setup.

Updated by Eregon (Benoit Daloze) about 6 years ago

normalperson (Eric Wong) wrote:

It's been a known problem for decades, now (at least since the
days of mod_perl + DBI on Apache 1.x); and AFAIK there's no data
leaks from it. Anybody who makes that mistake would likely
raise/crash/burn long before seeing, much less leaking sensitive data.

Yes, it's not a new problem.
I disagree about no production leaks, because it happened to me on a website running for a national programming contest.
For most of the contest it was fine as one process was able to handle the load,
but at some point the webserver decided to spawn another process by forking,
people starting seeing each's other solution, the scores were corrupted and everyone was puzzled as to what happened.
We had to stop the contest due to this.

I want to help protect future programmers from such bugs, if at all possible.
And I believe it's possible.

I agree with Jeremy on this; it will likely cause new problems
and surprises if libraries use it.

Let's design it so it doesn't.
What's the harm/surprise to reconnect in at_fork(:after_child) for instance?

The current hooks are webserver-specific and so migrating between unicorn/puma/passenger/etc means
it's quite easy to forget to adapt to the new webserver hook, which would trigger this bug.
Especially if e.g., using ActiveRecord with Passenger, where the hook is automatic and migrating to another forking webserver.
Having standard hooks solves this, also works for rb_f_fork(),
and empowers libraries to solve this issue (not just DB libraries but also concurrent-ruby (see above), etc).

Updated by normalperson (Eric Wong) about 6 years ago

wrote:

normalperson (Eric Wong) wrote:

It's been a known problem for decades, now (at least since the
days of mod_perl + DBI on Apache 1.x); and AFAIK there's no data
leaks from it. Anybody who makes that mistake would likely
raise/crash/burn long before seeing, much less leaking sensitive data.

Yes, it's not a new problem.
I disagree about no production leaks, because it happened to me on a website running for a national programming contest.
For most of the contest it was fine as one process was able to handle the load,
but at some point the webserver decided to spawn another process by forking,
people starting seeing each's other solution, the scores were corrupted and everyone was puzzled as to what happened.
We had to stop the contest due to this.

fork is full of caveats; using atfork hook to work around one
caveat out of many is not a solution. The solution is knowing
the caveats of the tools you use.

In your case, it seemed like you were not paying attention to
the server setup at all and would not have known to use atfork
hook regardless if it was in the webserver or core.

I want to help protect future programmers from such bugs, if at all possible.
And I believe it's possible.

I agree with Jeremy on this; it will likely cause new problems
and surprises if libraries use it.

Let's design it so it doesn't.
What's the harm/surprise to reconnect in at_fork(:after_child) for instance?

It's a waste of time and resources when the child has no need for
a connection at all. Simply put, library authors have no clue
how an application will use/manage processes and would
(rightfully) not bother using such hooks.

Also, consider that pthread_atfork has been around for many
years, it's not adopted by library authors (of C/C++ libraries)
because of problems surrounding it; and POSIX is even
considering deprecating pthread_atfork[1].

How about an alternate proposal?

Introduce a new object_id-like identifier which changes
across fork: Thread.current.thread_id

It doesn't penalize platforms without fork, and can work well
with existing thread-aware code.

IMHO, Thread.current.object_id being stable in forked child
isn't good; but I expect compatibility problems if we change it
at this point. At least some usages of monitor.rb would
break.

The current hooks are webserver-specific and so migrating
between unicorn/puma/passenger/etc means it's quite easy to
forget to adapt to the new webserver hook, which would trigger
this bug.

I hate the amount of vendor lock-in each webserver has.
But making hooks which library authors can fire unpredictably
on application authors is worse, especially if there's no
"opt-out".

[1] http://austingroupbugs.net/view.php?id=858

Updated by tenderlovemaking (Aaron Patterson) about 6 years ago

normalperson (Eric Wong) wrote:

wrote:

normalperson (Eric Wong) wrote:

It's been a known problem for decades, now (at least since the
days of mod_perl + DBI on Apache 1.x); and AFAIK there's no data
leaks from it. Anybody who makes that mistake would likely
raise/crash/burn long before seeing, much less leaking sensitive data.

Yes, it's not a new problem.
I disagree about no production leaks, because it happened to me on a website running for a national programming contest.
For most of the contest it was fine as one process was able to handle the load,
but at some point the webserver decided to spawn another process by forking,
people starting seeing each's other solution, the scores were corrupted and everyone was puzzled as to what happened.
We had to stop the contest due to this.

I've experienced a data leak similar to this. Saying the impact was "catastrophic" would be an understatement. :(

fork is full of caveats; using atfork hook to work around one
caveat out of many is not a solution. The solution is knowing
the caveats of the tools you use.

In your case, it seemed like you were not paying attention to
the server setup at all and would not have known to use atfork
hook regardless if it was in the webserver or core.

I want to help protect future programmers from such bugs, if at all possible.
And I believe it's possible.

I agree with Jeremy on this; it will likely cause new problems
and surprises if libraries use it.

Let's design it so it doesn't.
What's the harm/surprise to reconnect in at_fork(:after_child) for instance?

It's a waste of time and resources when the child has no need for
a connection at all. Simply put, library authors have no clue
how an application will use/manage processes and would
(rightfully) not bother using such hooks.

I think library authors can make things easier though. Web frameworks, like Rails for example, are expected to handle this situation for the user. In addition, say a library author provided no such feature like Sequel, how would a user know they need to call DB.disconnect after a fork? Are they responsible for completely understanding the implementation of the library they are using? Even if an end user called DB.disconnect in an after fork hook, what if that wasn't enough? How would an end user know what needs to be called?

Also, consider that pthread_atfork has been around for many
years, it's not adopted by library authors (of C/C++ libraries)
because of problems surrounding it; and POSIX is even
considering deprecating pthread_atfork[1].

How about an alternate proposal?

Introduce a new object_id-like identifier which changes
across fork: Thread.current.thread_id

It doesn't penalize platforms without fork, and can work well
with existing thread-aware code.

I think this is a good idea, but I'm not sure it addresses the communication issue I brought up. IMO it would be great to have some sort of hook so that library authors can dictate what "the right thing to do" is after a fork (maybe there are other resources or caches that need to be cleaned, and maybe that changes from version to version).

IMHO, Thread.current.object_id being stable in forked child
isn't good; but I expect compatibility problems if we change it
at this point. At least some usages of monitor.rb would
break.

The current hooks are webserver-specific and so migrating
between unicorn/puma/passenger/etc means it's quite easy to
forget to adapt to the new webserver hook, which would trigger
this bug.

I hate the amount of vendor lock-in each webserver has.
But making hooks which library authors can fire unpredictably
on application authors is worse, especially if there's no
"opt-out".

I think requiring users to specify a db disconnect after fork causes even more "vendor lock-in". Lets say I did add the after fork code to deal with Sequel, but now I want to switch to a threaded webserver. Now I have to do more work to figure out what's required (if anything) in a threaded environment. It puts the onus on the app dev to figure out what's right for a particular environment, and that means it's harder to change: locking you in by making more work.

Additionally, forking servers all have to provide this type of hook anyway (Unicorn, Resque, Puma, to name a few) but today they have to specify their own API. I think it would be great if we had a "Rack for fork hooks", if that makes sense. :)

Updated by jeremyevans0 (Jeremy Evans) about 6 years ago

tenderlovemaking (Aaron Patterson) wrote:

I think library authors can make things easier though. Web frameworks, like Rails for example, are expected to handle this situation for the user. In addition, say a library author provided no such feature like Sequel, how would a user know they need to call DB.disconnect after a fork? Are they responsible for completely understanding the implementation of the library they are using? Even if an end user called DB.disconnect in an after fork hook, what if that wasn't enough? How would an end user know what needs to be called?

Preloading before forking is not the default in unicorn or puma, and the documentation for both unicorn and puma mention the related issues in the documentation that describes how to enable preloading before forking. Additionally, Sequel's documentation mentions the need to disconnect when using preloading before forking. Users should not be expected to understand the implementation of the libraries they are using, but they should be expected to read the documentation for non-default configuration options they explicitly enable.

Now, passenger enables preloading before forking by default. Some people may consider that an issue with passenger or others may be OK with them just optimizing for the case where only ActiveRecord is used without any other libraries that would allocate file descriptors. To be fair, passenger's documentation does discuss the issue in detail.

FWIW, disconnecting (at least with Sequel) should be done before forking, not after forking. After forking you are already sharing the file descriptors. As long as the library can reestablish connections as needed (as Sequel can), a single line of code to disconnect before fork is globally much simpler than extensive code that tries (and fails) to handle all possible cases when fork can be used.

I will posit that any attempt to disconnect automatically during forking (either in the parent or child) can break a reasonable (if less common) setup where forking is used correctly without explicit disconnection.

How about an alternate proposal?

Introduce a new object_id-like identifier which changes
across fork: Thread.current.thread_id

It doesn't penalize platforms without fork, and can work well
with existing thread-aware code.

I think this is a good idea, but I'm not sure it addresses the communication issue I brought up. IMO it would be great to have some sort of hook so that library authors can dictate what "the right thing to do" is after a fork (maybe there are other resources or caches that need to be cleaned, and maybe that changes from version to version).

At least for Sequel, I don't think this will matter. I'm neither for nor against this.

Additionally, forking servers all have to provide this type of hook anyway (Unicorn, Resque, Puma, to name a few) but today they have to specify their own API. I think it would be great if we had a "Rack for fork hooks", if that makes sense. :)

I think the main problem here is that libraries may offer before_fork or after_fork hooks that are called with arguments that a in-core at_fork hook couldn't use. I know that is the case with unicorn.

Additionally, there is the issue with in-core at_fork hooks in terms of inheritance:

  fork do # calls at_fork hooks
    fork do # calls at_fork hooks?
    end
  end

There are arguments for inheriting the hooks, and there are arguments for not doing so. I suppose you could make it configurable, but that increases complexity. When fork hooks are implemented in a library, they apply to only the case where the library forks, and not all cases where fork is called.

Consider that you are using a preforking webserver, and in your parent process, you do:

r, w = IO.pipe
if fork
  # ...
else
  # ...
  exit!
end  

If an in-core at_fork handler is used, the webserver's forking hooks may be called for this fork, when that may not be desired. That is not the case for library-specific fork hooks that wrap the library's usage of fork.

Updated by normalperson (Eric Wong) about 6 years ago

wrote:

normalperson (Eric Wong) wrote:

wrote:

I want to help protect future programmers from such bugs, if at all possible.
And I believe it's possible.

I agree with Jeremy on this; it will likely cause new problems
and surprises if libraries use it.

Let's design it so it doesn't.
What's the harm/surprise to reconnect in at_fork(:after_child) for instance?

It's a waste of time and resources when the child has no need for
a connection at all. Simply put, library authors have no clue
how an application will use/manage processes and would
(rightfully) not bother using such hooks.

I think library authors can make things easier though. Web
frameworks, like Rails for example, are expected to handle
this situation for the user. In addition, say a library
author provided no such feature like Sequel, how would a user
know they need to call DB.disconnect after a fork? Are they
responsible for completely understanding the implementation of
the library they are using? Even if an end user called
DB.disconnect in an after fork hook, what if that wasn't
enough? How would an end user know what needs to be called?

I don't know how to deal with unknowledgeable users aside from
making them knowledgeable about the tools they use.

However, the issue also relates to allowing things to become
thread-safe and fiber-safe at the moment, and in the future:
Guild-safe.

So maybe ko1 can give his thoughts on this topic since he
is working on Guilds

Also, consider that pthread_atfork has been around for many
years, it's not adopted by library authors (of C/C++ libraries)
because of problems surrounding it; and POSIX is even
considering deprecating pthread_atfork[1].

How about an alternate proposal?

Introduce a new object_id-like identifier which changes
across fork: Thread.current.thread_id

It doesn't penalize platforms without fork, and can work well
with existing thread-aware code.

I think this is a good idea, but I'm not sure it addresses the
communication issue I brought up. IMO it would be great to
have some sort of hook so that library authors can dictate
what "the right thing to do" is after a fork (maybe there are
other resources or caches that need to be cleaned, and maybe
that changes from version to version).

Library authors also do not know if the child process will touch
their objects/code; so making such assumptions can be costly
when child processes with different goals are used.

I think the only thing low-level library authors (e.g. authors
of libpq/sqlite3/libcurl/etc...) can do is document how to deal
with such issues when interacting with threads and processes.

Maybe Sequel and AR can have add optional PID checks
($$ != expected_pid), which is totally generic and usable
in bunch of cases.

One thing I have been annoyed by is END/at_exit hooks firing
unexpectedly in child processes. I work around them with the
PID check. So I want to avoid having the same annoyance from
at_exit by avoiding atfork hooks.

IMHO, Thread.current.object_id being stable in forked child
isn't good; but I expect compatibility problems if we change it
at this point. At least some usages of monitor.rb would
break.

The current hooks are webserver-specific and so migrating
between unicorn/puma/passenger/etc means it's quite easy to
forget to adapt to the new webserver hook, which would trigger
this bug.

I hate the amount of vendor lock-in each webserver has.
But making hooks which library authors can fire unpredictably
on application authors is worse, especially if there's no
"opt-out".

I think requiring users to specify a db disconnect after fork
causes even more "vendor lock-in". Lets say I did add the
after fork code to deal with Sequel, but now I want to switch
to a threaded webserver. Now I have to do more work to figure
out what's required (if anything) in a threaded environment.
It puts the onus on the app dev to figure out what's right for
a particular environment, and that means it's harder to
change: locking you in by making more work.

AFAIK, Sequel and AR already provides out-of-the-box thread-safe
behavior. Perhaps Sequel and AR should also have out-of-the-box
thread-AND-fork-safe connection pool which checks the PID. I
don't expect PID check cost to be expensive, but they can make
that an option for users and it'll work on all forking servers
and maybe Guild servers in the future.

People who don't use forking can use old thread-safe-only or
single-threaded-only code.

The key thing I want to avoid from having atfork hooks is:

DB.disconnect
pid = fork do
  # something which never touches DB
  exit!(0) # avoid at_exit hook
end
DB.reconnect

It's needlessly expensive in the parent process to pay the
disconnect/reconnect cost when forking a child which never
touches the DB. And there's already a gotcha for avoiding
at_exit hooks.

Additionally, forking servers all have to provide this type of
hook anyway (Unicorn, Resque, Puma, to name a few) but today
they have to specify their own API. I think it would be great
if we had a "Rack for fork hooks", if that makes sense. :)

I don't think limiting this to fork is necessary; forking may
become obsolete with Guilds.

Updated by jeremyevans0 (Jeremy Evans) about 6 years ago

normalperson (Eric Wong) wrote:

Maybe Sequel and AR can have add optional PID checks
($$ != expected_pid), which is totally generic and usable
in bunch of cases.

My problem with doing this by default is that it penalizes knowledgeable users just to avoid problems for users that don't read the documentation. Some libraries may consider that an acceptable tradeoff, but I do not. I guess it depends at least partly on the expected audience for the library.

I'd be OK with this as an optional behavior, but in general anyone who would use it would be better off just disconnecting before forking, so I haven't implemented it yet.

AFAIK, Sequel and AR already provides out-of-the-box thread-safe
behavior. Perhaps Sequel and AR should also have out-of-the-box
thread-AND-fork-safe connection pool which checks the PID. I
don't expect PID check cost to be expensive, but they can make
that an option for users and it'll work on all forking servers
and maybe Guild servers in the future.

I'm guessing that Guild servers will operate differently than forking servers (not having different PIDs for different Guilds), but ko1 would know more about that.

I don't think limiting this to fork is necessary; forking may
become obsolete with Guilds.

I do not think that will be the case. For one thing, you are always going to want a forking approach if you want a parent process that forks children and reaps them when they crash/exit. This is especially true when the child process uses different permissions than the parent via setuid. It could definitely be the case that a single-process multi-Guild approach will become more popular than current forking multi-process servers.

Updated by byroot (Jean Boussier) over 3 years ago

Could this at_exit proposal be considered again? I understand and agree that in theory libraries needed to protect against forks should expose a after_fork method to be called after forking, and that it's the users's responsibility to hook the two properly.

But in practice many libraries prefer to check Process.pid, and often end up doing so in tight loops. And they do it so much that it end up being significant on performance profiles in real production environments.

Updated by byroot (Jean Boussier) over 3 years ago

Could this at_exit proposal be considered again?

Sorry, I meant at_fork

Updated by normalperson (Eric Wong) over 3 years ago

wrote:

Feature #5446: at_fork callback API
https://bugs.ruby-lang.org/issues/5446#change-91273

But in practice many libraries prefer to check Process.pid,
and often end up doing so in tight loops. And they do it so
much that it end up being significant on performance profiles
in real production environments.

Some things have changed in over the years:

  • getpid() no longer cached by glibc, so it's always a syscall, now

  • syscalls are more expensive due to CPU vulnerability mitigations

  • Ractor becoming usable/available in Ruby 3+

  • system/popen/backtick use vfork() since Ruby 2.2

  • JIT exists:

Is there a way to expose Ruby methods/procs as C function pointers with JIT?

If so, Fiddle may be usable to call pthread_atfork(3) and have
it use JIT-exposed C function pointers.

That would be generally useful for interacting with other C libraries,
not just for pthread_atfork.

Updated by Eregon (Benoit Daloze) over 3 years ago

Is there a way to expose Ruby methods/procs as C function pointers with JIT?

@normalperson (Eric Wong) That feels very hacky to me.
JIT'ing a function which is only called once seems suboptimal (and difficult to do before it's even called once).
Also some Ruby state likely needs to be restored before running any after_fork hook.

@byroot (Jean Boussier) I agree, we should at least have after_fork (or at_fork(when, &block)).

Honestly I don't understand why we don't have this yet, it's obviously needed and every forking webserver out there ends up having its own hook.

For the use-case of purposefully keeping a database connection (which seems extremely rare), maybe one can require the database library with a different path to be explicit that playing with database fd sharing fire is wanted? (or some other mechanism so the at_fork hook is not installed in that case).

For the vast majority of fork uses in practice it would be much safer if database connections and other fds which are unsafe to keep open across forks were automatically closed.

And BTW the current workaround of checking pid is slower also on platforms not supporting fork, so it is inefficient for all platforms and webservers.

Updated by Eregon (Benoit Daloze) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-38:

require the database library with a different path

Or by having some logic in e.g. rack to handle that. I'm sure we can find something.
But the hook needs to be in core, otherwise we'd need to monkey-patch Kernel#fork, and that wouldn't work for rb_f_fork.

Updated by byroot (Jean Boussier) over 3 years ago

I agree, we should at least have after_fork

Yes, that's what's needed 99% of the time.

We implemented it in Active Support a few years back, but many more libraries would need it, and it doesn't feel clean, and I feel bad doing the same thing in other libraries.

Updated by byroot (Jean Boussier) over 3 years ago

To clarify, I don't think we even need native like pthread_atfork, as as far as I know it's not safe for extensions to use fork(2) unless of course they immediately exec().

So what I'm asking for is simply for Process.fork to call a after_fork callback in the child. Maybe Process.after_fork = Proc?

Updated by Dan0042 (Daniel DeLorme) over 3 years ago

For the vast majority of fork uses in practice it would be much safer if database connections and other fds which are unsafe to keep open across forks were automatically closed.

Indeed, and it's possible to do that with IO#close_on_exec, which seems to me like the appropriate solution in 99% of cases where an after_fork hook is requested. So while I agree in principle that after_fork would be useful, it would be nice to see a use-case that just cannot be solved with IO#close_on_exec

Updated by normalperson (Eric Wong) over 3 years ago

wrote:

Is there a way to expose Ruby methods/procs as C function pointers with JIT?

@normalperson (Eric Wong) That feels very hacky to me.

fork() seems hacky now that Ractor exists, and
Process.spawn/popen/system use vfork (along with cloexec being
the default)

JIT'ing a function which is only called once seems suboptimal
(and difficult to do before it's even called once).
Also some Ruby state likely needs to be restored before
running any after_fork hook.

Maybe not "JIT", but just-ahead-of-time or on-demand-compiled.
Something like "Proc#compile_to_c" returning an Integer pointer.

It would be generally useful for Fiddle to pass callbacks
implemented in Ruby around, but I agree tricky to get right
because Ruby state could be indeterminate when the C function is
called...

@byroot (Jean Boussier) I agree, we should at least have after_fork (or at_fork(when, &block)).

For projects where I'm stuck having to fork, I find "prepare"
(before_fork) just as useful. I've only used "parent"
(after fork), once, however.

Honestly I don't understand why we don't have this yet, it's
obviously needed and every forking webserver out there ends up
having its own hook.

With Ractor, perhaps fork() shouldn't be encouraged/promoted, anymore.

And BTW the current workaround of checking pid is slower also
on platforms not supporting fork, so it is inefficient for
all platforms and webservers.

Agreed.

Updated by byroot (Jean Boussier) over 3 years ago

it would be nice to see a use-case that just cannot be solved with IO#close_on_exec

The need is not limited to IOs. If you have say some kind of buffered event queue (example), you need to clear it after fork otherwise you will send some events twice.

fork() seems hacky now that Ractor exists

Seems kind of out of topic to discuss the merits of threads vs process, but even then I wouldn't say so. First Ractors still aren't quite production ready, but even if they were, standalone processes still provide extra isolation that is in some ways superior to threads/ractors. When unicorn timeouts a workers, since it's a distinct process, you know the replacement will be spawned clean. If your workers are thread/ractors you either have to be certain you won't need to kill them, or be certain that your whole software stack will handle it cleanly. And from experience, most code out there doesn't (no blame here, it's incredibly hard to do).

Of course you can always consider that a service that get stuck or slow and need to be killed is a bug in itself, but this kind of resiliency as a last line of defence is appreciable.

Updated by byroot (Jean Boussier) over 3 years ago

it would be nice to see a use-case that just cannot be solved with IO#close_on_exec

Another use case is if you have a dispatcher thread, it will need to be restarted after fork.

Updated by Dan0042 (Daniel DeLorme) over 3 years ago

For the vast majority of fork uses in practice it would be much safer if database connections and other fds which are unsafe to keep open across forks were automatically closed.

Indeed, and it's possible to do that with IO#close_on_exec,

I apologize, somehow I was thinking that fork belongs to the exec family of functions and so IO#close_on_exec applied to it, but that's obviously not the case. Since ruby doesn't have something like IO#close_on_fork then yeah having an after_fork hook is very necessary.

Updated by byroot (Jean Boussier) over 3 years ago

I'm thinking a simpler alternative would be to make Process.fork the unique forking method: https://github.com/ruby/ruby/pull/4361

This would make decorating Process.fork trivial, effectively offering an easy way to register a callback on either before or after fork.

Updated by ivoanjo (Ivo Anjo) over 3 years ago

+1 we have this issue at Datadog as well. Our profiler keeps a background thread that takes samples and reports back performance information, and we'd like to support keeping the profiler enabled even when processes fork. (Our yet-another-at_fork is https://github.com/DataDog/dd-trace-rb/blob/feature/profiling/lib/ddtrace/profiling/ext/forking.rb ).

@byroot's solution I think would be a great improvement, as at least we'd be able to just prepend our needed behavior.

Updated by mame (Yusuke Endoh) over 3 years ago

Can anyone summarize the latest proposal? I cannot understand what is needed precisely. I'm afraid if some people see only the name "at_fork", and do not share the context.

What event should be hooked? Only a call to Kernel#fork and Process.fork? Also Process.daemon? Kernel#system, #spawn and IO#popen? Or fork(2) system call? Some talk about the redefinition of Kernel#fork, but the others do about pthread_atfork. I guess they should be distinguished.

I understand the motivation. Multiple gems implement their own fork hook system, which is not good. But are the features the same? Did anyone study them? I guess we need to provide the (reasonably) greatest common feature set.

The discussion is too long (for me) to follow, so it may be helpful to create another ticket with a brief description of use-case study and detailed proposal.

Updated by mame (Yusuke Endoh) over 3 years ago

byroot (Jean Boussier) wrote in #note-50:

@mame (Yusuke Endoh) I'll do it.

Thanks!

Updated by byroot (Jean Boussier) over 3 years ago

@mame (Yusuke Endoh) I created https://bugs.ruby-lang.org/issues/17795, let me know if it makes sense to you, in which case I'll update my developer meeting request.

Actions #53

Updated by Eregon (Benoit Daloze) over 3 years ago

  • Related to Feature #17795: Around `Process.fork` callbacks API added

Updated by mame (Yusuke Endoh) over 3 years ago

  • Status changed from Assigned to Closed

byroot (Jean Boussier) wrote in #note-52:

@mame (Yusuke Endoh) I created https://bugs.ruby-lang.org/issues/17795, let me know if it makes sense to you, in which case I'll update my developer meeting request.

Thank you, it is very clear!

I'm closing this ticket since its goal was drifted over the years. If anyone is not satisfied, let me know.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0