Project

General

Profile

Actions

Bug #18911

closed

Process._fork hook point is not called when Process.daemon is used

Added by ivoanjo (Ivo Anjo) almost 2 years ago. Updated 7 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin20]
[ruby-core:109196]

Description

Hello there! I'm working at Datadog on the ddtrace gem, and we need to hook into fork operations to make sure that our products work correctly/automatically even in environments that fork.

As part as #17795 a new Process._fork method was added to allow libraries and frameworks to easily hook into fork operations. I was investigating its use in ddtrace and noticed the following gap: the Process.daemon API internally makes use of fork, but the new hook point is not called for that API.

Testcase:

puts RUBY_DESCRIPTION

module ForkHook
  def _fork(*args)
    puts "  #{Process.pid} Before fork!"
    res = super
    puts "  #{Process.pid} After fork!"
    res
  end
end

Process.singleton_class.prepend(ForkHook)

puts "#{Process.pid} Regular fork:"

fork { exit }
Process.wait

puts "#{Process.pid} Process.daemon:"

Process.daemon(nil, true)

puts "#{Process.pid} Finishing!"

Testcase output:

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin20]
48136 Regular fork: # <-- original process
  48136 Before fork!
  48136 After fork! # <-- original process
  48137 After fork! # <-- child process
48136 Process.daemon: # <-- original process
48139 Finishing! # <-- forks and pid changes, but the hook isn't called

This was surprising to me since the advantage of this hook point would not not needing to hook into the many other places where fork can get called from.

Thanks a lot :)

Updated by mame (Yusuke Endoh) almost 2 years ago

  • Assignee set to akr (Akira Tanaka)

@akr (Akira Tanaka) suggested special treatment for Process.daemon, and I followed him when implementing it. So this is by design. However, I forgot the reason why Process._fork should ignore daemon. I'll ask him at the next dev meeting.

Updated by mame (Yusuke Endoh) almost 2 years ago

@ivoanjo (Ivo Anjo) Do you have difficulties due to this behavior in terms of ddtrace? Or you were just "surprised"? The motivation is very important to discuss the issue. Process.daemon stops threads, so I guess you have any difficulties, but I'd like to confirm it before the dev meeting.

Updated by mame (Yusuke Endoh) almost 2 years ago

@nobu (Nobuyoshi Nakada) said that this is because Process.daemon does not call fork(2) but daemon(3). We happen to know that daemon(3) calls fork(2) (on some environments), but other unknown C functions calling fork internally cannot be handled. So Process._fork hooks only events that Ruby calls fork(2) directly.

Updated by ivoanjo (Ivo Anjo) almost 2 years ago

Thanks @mame (Yusuke Endoh) for the awesomely quick reply :)

@ivoanjo (Ivo Anjo) (Ivo Anjo) Do you have difficulties due to this behavior in terms of ddtrace? Or you were just "surprised"? The motivation is very important to discuss the issue. Process.daemon stops threads, so I guess you have any difficulties, but I'd like to confirm it before the dev meeting.

Yeah, it surprised me because it was a situation where there's a fork (albeit indirectly) and threads die, so I needed to do cleanups/restart stuff, but was not covered by the _fork.

@nobu (Nobuyoshi Nakada) (Nobuyoshi Nakada) said that this is because Process.daemon does not call fork(2) but daemon(3). We happen to know that daemon(3) calls fork(2) (on some environments), but other unknown C functions calling fork internally cannot be handled. So Process._fork hooks only events that Ruby calls fork(2) directly.

I guess this is perhaps more of a discoverability/documentation issue. My understanding was "hooking _fork was all you needed", but actually daemon needs to be as well. But in practical terms, if I'm hooking Process._fork, it's not hard to also hook Process.daemon.

Next to the code implementing _fork we have the following comment:

 /*
...
 *  This method is not for casual code but for application monitoring
 *  libraries. You can add custom code before and after fork events
 *  by overriding this method.
 */
VALUE
rb_proc__fork(VALUE _obj)

...would it be reasonable to add a "Note: Process#daemon is similar to fork, but does not go through this method." or something similar?

Updated by mame (Yusuke Endoh) over 1 year ago

...would it be reasonable to add a "Note: Process#daemon is similar to fork, but does not go through this method." or something similar?

Let's go with this! Could you please send a PR?


At the dev meeting, I briefly talked with @akr (Akira Tanaka) about this issue.

A main motivation of Process._fork discussed in #17795 was to disconnect the database connections before fork. This was to prevent corrupted communication if the parent and child processes accessed the database connection in parallel after fork. Process.daemon uses fork(2) but does not cause this problem because the parent process exits immediately after fork. Thus, for those who override Process._fork for the original purpose, it would be more natural for Process._fork not to be invoked in Process.daemon. Unfortunately, this design does not fit with your use case to restart a monitoring thread after fork.

ivoanjo (Ivo Anjo) wrote in #note-4:

Yeah, it surprised me because it was a situation where there's a fork (albeit indirectly) and threads die, so I needed to do cleanups/restart stuff, but was not covered by the _fork.

Just FYI, according to @akr (Akira Tanaka), some OSes supports daemon(3) more directly; daemon(3) does not use "double fork" hack internally on a such OS.

Updated by ivoanjo (Ivo Anjo) over 1 year ago

I've just opened https://github.com/ruby/ruby/pull/6170 to add the comment.

Thanks @mame (Yusuke Endoh) for going above and beyond in helping me with this :)

Actions #7

Updated by ivoanjo (Ivo Anjo) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|74817f3d37bb7153385f682f75e37713c4c8009d.


[DOC] Process._fork does not get called by Process.daemon

As discussed in Bug #18911, I'm adding some documentation to
Process._fork to clarify that it is not expected to cover
calls to Process.daemon.

Co-authored-by: Yusuke Endoh

Updated by Dan0042 (Daniel DeLorme) 7 months ago

Note for future readers coming here from the Process._fork documentation.
If you need to restart threads, you should override both _fork and daemon

module RestartWatcherThread
  def _fork
    pid = super
    restart_thread if pid == 0
    pid
  end
  def daemon(...)
    super.tap{ restart_thread }
  end
  prepend_features(Process.singleton_class)
end
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0