Project

General

Profile

Actions

Bug #20672

closed

UNIXSocket.pair transmitting data between pids looks flaky

Added by danh337 (Dan H) 4 months ago. Updated 4 months ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
[ruby-core:118827]

Description

I have code that uses UNIXSocket.pair and fork to send data between parent and child.

It seems to work fine for a small number of messages passed, but then fails with Errno::EBADF on the child pid writing to its socket received from the parent.

I attached a test driver which includes successful and failed test runs. This is boiled down logic from a larger app, where I first started seeing this issue. I believe I first saw this on Ruby 3.0, but now I am on 3.3.4.

This could be me, as in my code or my workstation env, but I have not been able to prove that from many web searches. If you look at my test driver and see something wrong with it, I will take the shame and learn something. BUT just in case this is an issue in low level Ruby code, I am submitting this here.


Files

test.rb (13 KB) test.rb Show logic, including log of success vs failed run danh337 (Dan H), 08/10/2024 02:41 AM

Updated by byroot (Jean Boussier) 4 months ago

I'm able to reproduce with large enough message (e.g. -c 4000 on my machine).

However if I change:

received_w = UNIXSocket.for_fd(main_c.recv_io.fileno)

with:

received_w = main_c.recv_io
received_w.sync = true

It fixes the problem. I'm not yet clear on what is causing this, but I suspect some metadata is lost when you use for_fd. But I think it should work, so I'll try to figure out what exactly.

Updated by byroot (Jean Boussier) 4 months ago

  • Status changed from Open to Rejected

Alright I've figured it out.

recv_io creates an IO instance, that you then discard. But that IO instance once garbage collected is automatically closed.

So this issue can be reproduced much earlier by calling GC.start just after UNIXSocket.for_fd(main_c.recv_io.fileno).

My suggestion fixes it, because it avoid creating two IOs for the same FD, so the GC doesn't end up closing the file descriptor.

You can also fix this by setting autoclose = false on the IO instance returned by recv_io.

But the cleaner way to do this is:

received_w = main_c.recv_io(UNIXSocket.)

So yeah, not a bug in ruby, but in your code.

Updated by danh337 (Dan H) 4 months ago

byroot (Jean Boussier) wrote in #note-2:

Alright I've figured it out.
[...]

received_w = main_c.recv_io(UNIXSocket.)

So yeah, not a bug in ruby, but in your code.

WOW. @byroot (Jean Boussier) you are champion. I knew I needed to "cast" the received data from a plain IO to a UNIXSocket, but I didn't read the recv_io docs, which clearly state how to do this. You made a great catch. Cheers to you. Sorry for taking your time.

Actions

Also available in: Atom PDF

Like0
Like0Like1Like0