Bug #9356

TCPSocket.new does not seem to handle INTR

Added by Charlie Somerville 4 months ago. Updated 2 months ago.

[ruby-core:59516]
Status:Open
Priority:Normal
Assignee:-
Category:-
Target version:-
ruby -v:- Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN

Description

TCPSocket.new does not seem to handle EINTR properly.

In the attached test script, I try to open a TCP connection to my server and make an HTTP request while a background thread continually sends a signal to the process.

This causes the #write call to fail with:

x.rb:13:in write': Socket is not connected (Errno::ENOTCONN)
from x.rb:13:in
'

This also appears to affect 2.0.0. 1.9.3 is unaffected.

socket-eintr.rb Magnifier (207 Bytes) Charlie Somerville, 01/03/2014 07:29 PM

wait_connectable_infinite_loop_minimal_fix.diff Magnifier (478 Bytes) Shugo Maeda, 02/21/2014 01:00 PM


Related issues

Related to ruby-trunk - Bug #9547: TCPSocket.new causes an infinite loop when interrupted by... Closed 02/21/2014

History

#1 Updated by Eric Wong 4 months ago

This might be a bug exposed due to r36944
("avoid unnecessary select() calls before doing I/O")
which I don't want to revert.

Does the following fix it?

--- a/ext/socket/init.c
+++ b/ext/socket/init.c
@@ -400,12 +400,6 @@ rsockconnect(int fd, const struct sockaddr *sockaddr, int len, int socks)
status = (int)BLOCKING
REGIONFD(func, &arg);
if (status < 0) {
switch (errno) {
- case EINTR:
-#if defined(ERESTART)
- case ERESTART:
-#endif
- continue;
-
case EAGAIN:
#ifdef EINPROGRESS
case EINPROGRESS:
@@ -426,6 +420,12 @@ rsock
connect(int fd, const struct sockaddr *sockaddr, int len, int socks)
#if WAITINPROGRESS > 0
waitinprogress = WAITINPROGRESS;
#endif
+ case EINTR:
+#if defined(ERESTART)
+ case ERESTART:
+#endif
+ continue;
+
status = wait_connectable(fd);
if (status) {
break;


Fwiw, I'm not sure if the WAITINPROGRESS + getsockopt SOERROR is
even necessary when we can just do wait
connectable(fd) followed
by an eventual write/send/read/recv...

#2 Updated by Charlie Somerville 4 months ago

Eric: Unfortunately your patch doesn't fix it, still getting the same ENOTCONN error.

#3 Updated by Eric Wong 4 months ago

Thanks for trying. This is probably specific to *BSD sockets
implementation, so I can't reproduce it at the moment.

Can you try reverting parts of r36944 which affect IO#write?
("avoid unnecessary select() calls before doing I/O")

#4 Updated by Eric Wong 3 months ago

Eric Wong normalperson@yhbt.net wrote:

Thanks for trying. This is probably specific to *BSD sockets
implementation, so I can't reproduce it at the moment.

Not happening on FreeBSD 9.x kernels, at least. I could not reproduce
the problem on FreeBSD 9.2 nor Debian GNU/kFreeBSD sid (x86_64).

#5 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

Eric Wong normalperson@yhbt.net wrote:

Thanks for trying. This is probably specific to *BSD sockets
implementation, so I can't reproduce it at the moment.

Not happening on FreeBSD 9.x kernels, at least. I could not reproduce
the problem on FreeBSD 9.2 nor Debian GNU/kFreeBSD sid (x86_64).

On FreeBSD 10, Errno::ENOTCONN isn't raised, but Ruby goes in an infinite loop because getsockopt(2) in wait_connectable() sets sockerr to 0.

Does anybody know why the following code in ext/socket/init.c is necessary?

        if (sockerr == 0)
            continue;       /* workaround for winsock */

#6 Updated by Usaku NAKAMURA 2 months ago

  • ruby -v changed from ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin13.0] to -

Hi,

In message " [ruby-trunk - Bug #9356] TCPSocket.new does not seem to handle INTR"
on Feb.18,2014 18:34:40, shugo@ruby-lang.org wrote:

Does anybody know why the following code in ext/socket/init.c is necessary?

        if (sockerr == 0)
            continue;       /* workaround for winsock */

It was introduced at r7931 by me (9 years ago!),
but I forgot the reason.
If my comment of those days is believed, we may wrap it
with #ifdef_WIN32.

Regards,
--
U.Nakamura usa@garbagecollect.jp

#7 Updated by Eric Wong 2 months ago

"U.Nakamura" usa@garbagecollect.jp wrote:

In message " [ruby-trunk - Bug #9356] TCPSocket.new does not seem to handle INTR"
on Feb.18,2014 18:34:40, shugo@ruby-lang.org wrote:

Does anybody know why the following code in ext/socket/init.c is necessary?

        if (sockerr == 0)
            continue;       /* workaround for winsock */

It was introduced at r7931 by me (9 years ago!),
but I forgot the reason.
If my comment of those days is believed, we may wrap it
with #ifdef_WIN32.

OK. I wonder if we should even use getsockopt(SO_ERROR) at all.

I know there's much literature which recommends it, but any error check
in this way is racy. Better to let any subsequent
write/read/send/recv/etc error out.

The following should work:

connect() -> EINPROGRESS
rbwaitforsinglefd -> (must retry on EINTR)
(user calls) write() -> 0, ENOTCONN, EPIPE, ...

Note: rbwaitforsinglefd is necessary in FreeBSD (at least) to avoid
ENOTCONN, Linux just returns EAGAIN on write if write immediately after
connect.

#8 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

OK. I wonder if we should even use getsockopt(SO_ERROR) at all.

I know there's much literature which recommends it, but any error check
in this way is racy. Better to let any subsequent
write/read/send/recv/etc error out.

Could you describe such a race condition in detail?

I don't see a race condition on FreeBSD 10.

The problem I've seen is that rbwaitforsinglefd() returns RBWAITFDOUT, and getsockopt() with SO_ERROR doesn't return any error (this is an expected behavior when the socket is connected), and Ruby goes in an infinite loop.

This problem was introduced by r31424, and usa is not guilty at least for the problem I've seen.

How about to fix the code as follows?

        if (sockerr == 0) {
            if (revents & RB_WAITFD_OUT) {
                break;
            }
            else {
                continue;       /* control is reached here on winsock? */
            }
        }

And the following code seems to be unnecessary.

        if ((revents & (RB_WAITFD_IN|RB_WAITFD_OUT)) == RB_WAITFD_OUT) {
            ret = 0;
            break;
        }

As the following comment says, the intentions of if (revents & (RB_WAITFD_IN|RB_WAITFD_OUT)) { might be if (revents & RB_WIATFD_IN && revents & RB_WAITFD_OUT), but I guess revents & RBWAITFDIN is true not only when a failure occurs, but when data is ready to read.

    /*
     * Stevens book says, successful finish turn on RB_WAITFD_OUT and
     * failure finish turn on both RB_WAITFD_IN and RB_WAITFD_OUT.
     */

Note that the original problem reported by Charlie must be a different issue.
Charlie, could you show the output of dtruss when the problem happens?

#9 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

Eric Wong wrote:

OK. I wonder if we should even use getsockopt(SO_ERROR) at all.

I know there's much literature which recommends it, but any error check
in this way is racy. Better to let any subsequent
write/read/send/recv/etc error out.

Could you describe such a race condition in detail?

getsockopt(SO_ERROR) => no error
<kernel sees disconnect>
read/write => EPIPE, ENOTCONN, ...

Since we must check all (future) read/write operations for errors
anyways, getsockopt(SO_ERROR) is worthless.

I don't see a race condition on FreeBSD 10.

The problem I've seen is that rbwaitforsinglefd() returns RBWAITFDOUT, and getsockopt() with SO_ERROR doesn't return any error (this is an expected behavior when the socket is connected), and Ruby goes in an infinite loop.

This problem was introduced by r31424, and usa is not guilty at least for the problem I've seen.

How about to fix the code as follows?

        if (sockerr == 0) {
            if (revents & RB_WAITFD_OUT) {
                break;
            }
            else {
                continue;       /* control is reached here on winsock? */
            }
        }

Maybe, but I wonder if we can just drop a lot of code...

http://bogomips.org/ruby.git/patch?id=a4212dc9516f4

#10 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

Could you describe such a race condition in detail?

getsockopt(SO_ERROR) => no error

read/write => EPIPE, ENOTCONN, ...

Since we must check all (future) read/write operations for errors

Ah, I see. However, if getsockopt() returns no error, we can know that at least connect() succeeded, right? I'm not sure whether it's a so-called race condition.

anyways, getsockopt(SO_ERROR) is worthless.

I don't think getsockopt(SO_ERROR) is worthless because users can know error information sooner and more exactly than subsequent read() or write().

Maybe, but I wonder if we can just drop a lot of code...

http://bogomips.org/ruby.git/patch?id=a4212dc9516f4

With the patch, connect() is called again even if getsockopt(SO_ERROR) returns an error, so Errno::EINVAL is raised. It's confusing behavior.

#11 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

Eric Wong wrote:

Could you describe such a race condition in detail?

getsockopt(SO_ERROR) => no error

read/write => EPIPE, ENOTCONN, ...

Since we must check all (future) read/write operations for errors

Ah, I see. However, if getsockopt() returns no error, we can know
that at least connect() succeeded, right? I'm not sure whether it's a
so-called race condition.

I'm not sure if we can know for sure due to implementation differences.
And even if connect() succeeded, the server could decide to disconnect
right away after accept() due to overload, so probably less important.

anyways, getsockopt(SO_ERROR) is worthless.

I don't think getsockopt(SO_ERROR) is worthless because users can know
error information sooner and more exactly than subsequent read() or
write().

The error may be seen sooner, but I think the errors are uncommon
and most users will not care. They will see any error and handle it
their own way.

Maybe, but I wonder if we can just drop a lot of code...

http://bogomips.org/ruby.git/patch?id=a4212dc9516f4

With the patch, connect() is called again even if getsockopt(SO_ERROR) returns an error, so Errno::EINVAL is raised. It's confusing behavior.

Ah, I forget the outer for(;;) loop. Maybe it's better to not loop,
the WAITINPROGRESS stuff is confusing...

#12 Updated by Eric Wong 2 months ago

Eric Wong normalperson@yhbt.net wrote:

Ah, I forget the outer for(;;) loop. Maybe it's better to not loop,
the WAITINPROGRESS stuff is confusing...

I have no idea how portable this is:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

Btw, I suspect the WAITINPROGRESS stuff is carried over from the
1.8 days where all sockets were non-blocking by default, and overly
complicated as a result. I don't even think EINPROGRESS/EAGAIN is
possible, only EINTR/ERESTART.

#13 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

Ah, I see. However, if getsockopt() returns no error, we can know
that at least connect() succeeded, right? I'm not sure whether it's a
so-called race condition.

I'm not sure if we can know for sure due to implementation differences.

Hmm..., do you know any implementation where getsockopt(SO_ERROR) causes a problem?
If there's such an implementation, it would be better to remove the call, but I don't come up with such a situation.

By contraries, getsockopt(SO_ERROR) might be necessary for winsock, if usa's comment is right.

And even if connect() succeeded, the server could decide to disconnect
right away after accept() due to overload, so probably less important.

It applies equally to connect() without signal interruption.
TCPSocket.new should handle EINTR as transparently as possible, I think.

anyways, getsockopt(SO_ERROR) is worthless.

I don't think getsockopt(SO_ERROR) is worthless because users can know
error information sooner and more exactly than subsequent read() or
write().

The error may be seen sooner, but I think the errors are uncommon
and most users will not care. They will see any error and handle it
their own way.

Most applications might handle errors roughly, but logging the exact error information would help trouble shooting.

Ah, I forget the outer for(;;) loop. Maybe it's better to not loop,
the WAITINPROGRESS stuff is confusing...

I agree that the code is complicated, and it's better to simplify it, if possible.

I have no idea how portable this is:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

Btw, I suspect the WAITINPROGRESS stuff is carried over from the
1.8 days where all sockets were non-blocking by default, and overly
complicated as a result. I don't even think EINPROGRESS/EAGAIN is
possible, only EINTR/ERESTART.

The patch worked both on Linux 3.2.0 and FreeBSD 10.0 for blocking sockets with signal interruption.

It might be better to re-call connect() on ERESTART as the error suggests, but I'm not sure.
It seems to be a Linux way to re-call connect(), but there's no description about ERESTART in Linux's manual of connect(2) and instead there's a reference to POSIX.1-2001 (SUSv3), to which Linux seem not to conform. Funny.

#14 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

Eric Wong wrote:

Ah, I see. However, if getsockopt() returns no error, we can know
that at least connect() succeeded, right? I'm not sure whether it's a
so-called race condition.

I'm not sure if we can know for sure due to implementation differences.

Hmm..., do you know any implementation where getsockopt(SO_ERROR) causes a problem?
If there's such an implementation, it would be better to remove the call, but I don't come up with such a situation.

I'm not sure about implementations of getsockopt(SO_ERROR),
there are too many differences from what I see.

However, our use of getsockopt(SO_ERROR) causes problems
for maintainability and seems to lead to bugs.

By contraries, getsockopt(SO_ERROR) might be necessary for winsock, if usa's comment is right.

I'm not sure...
usa: can you try my last patch?
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

And even if connect() succeeded, the server could decide to disconnect
right away after accept() due to overload, so probably less important.

It applies equally to connect() without signal interruption.
TCPSocket.new should handle EINTR as transparently as possible, I think.

Agreed, and my f5e2eb00e5 patch handles EINTR.

anyways, getsockopt(SO_ERROR) is worthless.

I don't think getsockopt(SO_ERROR) is worthless because users can know
error information sooner and more exactly than subsequent read() or
write().

The error may be seen sooner, but I think the errors are uncommon
and most users will not care. They will see any error and handle it
their own way.

Most applications might handle errors roughly, but logging the exact error information would help trouble shooting.

Yes, and connect() still shows ETIMEDOUT/ECONNRESET/EHOSTUNREACH/etc.
Interrupted connect() is rare, so I think having the extra
hard-to-maintain code causes more problems than it solves.
I don't think users will care which of connect/write/read fails,
they just need to know an error happened.

I have no idea how portable this is:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

Btw, I suspect the WAITINPROGRESS stuff is carried over from the
1.8 days where all sockets were non-blocking by default, and overly
complicated as a result. I don't even think EINPROGRESS/EAGAIN is
possible, only EINTR/ERESTART.

The patch worked both on Linux 3.2.0 and FreeBSD 10.0 for blocking sockets with signal interruption.

Thanks for testing. I expect it to have no problem on most *BSD.
I mainly wanted some Win/Solaris/Apple users to try it.

It might be better to re-call connect() on ERESTART as the error suggests, but I'm not sure.
It seems to be a Linux way to re-call connect(), but there's no description about ERESTART in Linux's manual of connect(2) and instead there's a reference to POSIX.1-2001 (SUSv3), to which Linux seem not to conform. Funny.

I'm not sure, either. ERESTART should be handled by the system C
library and not seen by us, even. Not sure which (or any currently
supported) systems leak ERESTART...

#15 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

I'm not sure about implementations of getsockopt(SO_ERROR),
there are too many differences from what I see.

However, our use of getsockopt(SO_ERROR) causes problems
for maintainability and seems to lead to bugs.

Hmm..., the problem I've seen seem not to be a platform-dependent issue, but a simple bug, because the current code goes in an infinite loop when the connection is established without errors.
The current code wouldn't work on any platform except Linux, where connect() can be restartable without errors when it's interrupted by signals, so control never reach wait_connectable().

If I don't misunderstand Kosaki-san's intention, the code might be able to be simplified by removing for(;;) and waiting only RBWAITFDOUT (but I don't know whether it works well with winsock...).

What do you think, Kosaki-san?

And even if connect() succeeded, the server could decide to disconnect
right away after accept() due to overload, so probably less important.

It applies equally to connect() without signal interruption.
TCPSocket.new should handle EINTR as transparently as possible, I think.

Agreed, and my f5e2eb00e5 patch handles EINTR.

By "transparently" I meant that if TCPSocket.new raises an exception when it's not interrupted by a signal and fails, it should also raise an exception when it's interrupted by a signal and asynchronous connect() fails.

Most applications might handle errors roughly, but logging the exact error information would help trouble shooting.

Yes, and connect() still shows ETIMEDOUT/ECONNRESET/EHOSTUNREACH/etc.
Interrupted connect() is rare, so I think having the extra
hard-to-maintain code causes more problems than it solves.
I don't think users will care which of connect/write/read fails,
they just need to know an error happened.

It depends on how complex error handling by getsockopt() is, and we have a slight difference of opinion here.
I'd like to hear others' thoughts.

It might be better to re-call connect() on ERESTART as the error suggests, but I'm not sure.
It seems to be a Linux way to re-call connect(), but there's no description about ERESTART in Linux's manual of connect(2) and instead there's a reference to POSIX.1-2001 (SUSv3), to which Linux seem not to conform. Funny.

I'm not sure, either. ERESTART should be handled by the system C
library and not seen by us, even. Not sure which (or any currently
supported) systems leak ERESTART...

Linux's connect() does return ERESTART when it's interrupted by a signal.
On Linux, connect() is restartable. Please see the following page:

http://www.madore.org/~david/computers/connect-intr.html

The above page describes connect() returns EINTR on Linux, but it seems to return ERESTART nowadays.
(And it describes connect() returns EADDRINUSE instead of EALREADY on FreeBSD, but it returns EALREADY now.)

#16 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

I'd like to hear others' thoughts.

Likewise. We need to hear folks with more experience on other OSes.

Linux's connect() does return ERESTART when it's interrupted by a signal.
On Linux, connect() is restartable. Please see the following page:

http://www.madore.org/~david/computers/connect-intr.html

The above page describes connect() returns EINTR on Linux, but it seems to return ERESTART nowadays.
(And it describes connect() returns EADDRINUSE instead of EALREADY on FreeBSD, but it returns EALREADY now.)

Interesting. Anyways I'm not against handling ERESTART.

Note, POSIX connect manpage says this:

If connect() is interrupted by a signal that is caught while blocked
waiting to establish a connection, connect() shall fail and set errno
to [EINTR], but the connection request shall not be aborted, and the
connection shall be established asynchronously.

ref: http://pubs.opengroup.org/onlinepubs/009695399/functions/connect.html
Of course, not every system is POSIX-compliant.

Anyways, I have an alternative (v3) patch here which retries connect()
on EINTR and ERESTART:
http://bogomips.org/ruby.git/patch?id=8f48b1862
(also note my new switch/case style to avoid inline ifdef :)

However, I still prefer my v2 if possible:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

#17 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

Anyways, I have an alternative (v3) patch here which retries connect()
on EINTR and ERESTART:
http://bogomips.org/ruby.git/patch?id=8f48b1862
(also note my new switch/case style to avoid inline ifdef :)

However, I still prefer my v2 if possible:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

I prefer the latter too, but is break missing in the case clause?

Linux's man page of connect(2) says it's conforming to POSIX.1-2001 (in spite of the fact that it returns ERESTART), so I hope we don't need the second connect(). What do you think, Kosaki-san?

Or only retry on ERESTART is enough. We don't need the nested switch statements in the former patch, do we?

And I'd like to add the following minimal error handling code to wait_connectable():

static int
wait_connectable(int fd)
{
    int sockerr;
    socklen_t sockerrlen;

    /*
     * Stevens book says, successful finish turn on RB_WAITFD_OUT and
     * failure finish turn on both RB_WAITFD_IN and RB_WAITFD_OUT.
     * So it's enough to wait only RB_WAITFD_OUT and check the pending error
     * by getsockopt().
     *
     * Note: rb_wait_for_single_fd already retries on EINTR/ERESTART
     */
    int revents = rb_wait_for_single_fd(fd, RB_WAITFD_OUT, NULL);

    if (revents < 0)
        rb_bug_errno("rb_wait_for_single_fd used improperly", errno);

    sockerrlen = (socklen_t)sizeof(sockerr);
    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&sockerr, &sockerrlen) < 0)
        return -1;
    if (sockerr != 0) {
        errno = sockerr;
        return -1;
    }

    return 0;
}

This doesn't include difficult-to-read platform-dependent code which causes infinite loops.

#18 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

Eric Wong wrote:

Anyways, I have an alternative (v3) patch here which retries connect()
on EINTR and ERESTART:
http://bogomips.org/ruby.git/patch?id=8f48b1862
(also note my new switch/case style to avoid inline ifdef :)

However, I still prefer my v2 if possible:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

I prefer the latter too, but is break missing in the case clause?

I don't usually add it for the last case... Do some compilers need it?

Linux's man page of connect(2) says it's conforming to POSIX.1-2001 (in spite of the fact that it returns ERESTART), so I hope we don't need the second connect(). What do you think, Kosaki-san?

Or only retry on ERESTART is enough. We don't need the nested switch statements in the former patch, do we?

I'm not sure... Anyways, I think I may realize what happened with
Charlie's EINTR...

And I'd like to add the following minimal error handling code to wait_connectable():

I think it is OK with a minor tweak below:

static int
wait_connectable(int fd)
{
    int sockerr;
    socklen_t sockerrlen;

    /*
     * Stevens book says, successful finish turn on RB_WAITFD_OUT and
     * failure finish turn on both RB_WAITFD_IN and RB_WAITFD_OUT.
     * So it's enough to wait only RB_WAITFD_OUT and check the pending error
     * by getsockopt().
     *
     * Note: rb_wait_for_single_fd already retries on EINTR/ERESTART
     */
    int revents = rb_wait_for_single_fd(fd, RB_WAITFD_OUT, NULL);

    if (revents < 0)
     rb_bug_errno("rb_wait_for_single_fd used improperly", errno);

    sockerrlen = (socklen_t)sizeof(sockerr);
    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&sockerr, &sockerrlen) < 0)
        return -1;
    if (sockerr != 0) {
        errno = sockerr;

We should guard against sockerr setting errno to
EINTR, ERESTART, EINPROGRESS, EALREADY, EISCONN here...

Here is what I think happened in Charlie's original case:

connect() -> EINTR, kernel socket remembers this!
retry connect() -> EALREADY
wait_connectable()
    rb_wait_for_single_fd() -> success
    getsockopt(SO_ERROR), sockerr = EINTR from first connect!
    errno = sockerr -> raise :(

#19 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

However, I still prefer my v2 if possible:
http://bogomips.org/ruby.git/patch?id=f5e2eb00e5

I prefer the latter too, but is break missing in the case clause?

I don't usually add it for the last case... Do some compilers need it?

It's OK if it's your style. I just noticed it when I was trying to add an extra case clause for ERESTART.

    sockerrlen = (socklen_t)sizeof(sockerr);
    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&sockerr, &sockerrlen) < 0)
        return -1;
    if (sockerr != 0) {
        errno = sockerr;

We should guard against sockerr setting errno to
EINTR, ERESTART, EINPROGRESS, EALREADY, EISCONN here...

Agreed. You mean to expose only usual errors to users and to hide unusual platform-dependent errors, right?

Here is what I think happened in Charlie's original case:

connect() -> EINTR, kernel socket remembers this!
retry connect() -> EALREADY
waitconnectable()
rb
waitforsinglefd() -> success
getsockopt(SO
ERROR), sockerr = EINTR from first connect!
errno = sockerr -> raise :(

Charlie said Errno::ENOTCONN is raised by TCPSocket#write, not by TCPSocket.new.
I don't know why....

#20 Updated by Shugo Maeda 2 months ago

Shugo Maeda wrote:

On FreeBSD 10, Errno::ENOTCONN isn't raised, but Ruby goes in an infinite loop because getsockopt(2) in wait_connectable() sets sockerr to 0.

Can I commit the attached minimal fix as a workaround for this infinite loop bug?

Ruby 1.9.3 release is scheduled on Feb 24, and this is the last chance to fix normal bugs.
It's hard to estimate the impact of Eric's patch by then.

#21 Updated by Shugo Maeda 2 months ago

Shugo Maeda wrote:

Shugo Maeda wrote:

On FreeBSD 10, Errno::ENOTCONN isn't raised, but Ruby goes in an infinite loop because getsockopt(2) in wait_connectable() sets sockerr to 0.

Can I commit the attached minimal fix as a workaround for this infinite loop bug?

Ruby 1.9.3 release is scheduled on Feb 24, and this is the last chance to fix normal bugs.
It's hard to estimate the impact of Eric's patch by then.

I talked with Naruse-san and Nakamura-san, and have created #9547 and have committed the workaround.

#22 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

Eric Wong wrote:

We should guard against sockerr setting errno to
EINTR, ERESTART, EINPROGRESS, EALREADY, EISCONN here...

Agreed. You mean to expose only usual errors to users and to hide unusual
platform-dependent errors, right?

Yes, because we already handle most of those with rbwaitforsinglefd.

http://bogomips.org/ruby.git/patch?id=d8241102f54
git://80x24.org/ruby socket-connect-check-v4

Charlie said Errno::ENOTCONN is raised by TCPSocket#write, not by TCPSocket.new.
I don't know why....

Maybe one additional change helps:
http://bogomips.org/ruby.git/patch?id=66ca3633f43
git://80x24.org/ruby socket-connect-check-v5

I'm out of ideas. Charlie?

#23 Updated by Usaku NAKAMURA 2 months ago

  • Related to Bug #9547: TCPSocket.new causes an infinite loop when interrupted by a signal added

#24 Updated by Shugo Maeda 2 months ago

Eric Wong wrote:

shugo@ruby-lang.org wrote:

Eric Wong wrote:

We should guard against sockerr setting errno to
EINTR, ERESTART, EINPROGRESS, EALREADY, EISCONN here...

Agreed. You mean to expose only usual errors to users and to hide unusual
platform-dependent errors, right?

Yes, because we already handle most of those with rbwaitforsinglefd.

Definitely.

http://bogomips.org/ruby.git/patch?id=d8241102f54
git://80x24.org/ruby socket-connect-check-v4

The code looks fine, but please remove the following comment in wait_connectable().

 * So it's enough to wait only RB_WAITFD_OUT and check the pending error
 * by getsockopt().

Or there might be no need to wait RBWAITFDIN. I'm not sure.

#25 Updated by Eric Wong 2 months ago

shugo@ruby-lang.org wrote:

http://bogomips.org/ruby.git/patch?id=d8241102f54
git://80x24.org/ruby socket-connect-check-v4

The code looks fine, but please remove the following comment in wait_connectable().

 * So it's enough to wait only RB_WAITFD_OUT and check the pending error
 * by getsockopt().

OK, I'll remove if we commit these versions.

Or there might be no need to wait RBWAITFDIN. I'm not sure.

No need on Linux, but I think it is also harmless.

I'd like to hear from Charlie to see if -v4 or -v5 can fix
his error, first. -v5 might be more better, in fact..

Also available in: Atom PDF