Using connect_nonblock to open TCP connections in Net::HTTP#connect
Hey all, I've got a pull request at https://github.com/ruby/ruby/pull/1370 to start using connect_nonblock to open the TCP socket in Net::HTTP#connect, instead of doing a blocking connect that uses Timeout.timeout to look for timeouts. Using connect_nonblock is more efficient since it doesn't involve spinning up a separate thread to watch for timeouts, and also it avoids the race conditions inherent in the use of Timeout.timeout, as detailed in http://blog.headius.com/2008/02/ruby-threadraise-threadkill-timeoutrb.html
Over the last few versions of ruby there have been analogous fixes accepted to do this for opening an SSL connection in Net::HTTP#connect, so I'm guessing this shouldn't be too controversial, unless there is some issue I'm not aware of that kept the maintainers from implementing this as well...
Updated by normalperson (Eric Wong) over 4 years ago
Hey all, I've got a pull request at https://github.com/ruby/ruby/pull/1370
I wanted to remove Timeout and use connect_nonblock for a long
time, but it is not simple. DNS lookup from Socket.sockaddr_in
can take a long time if conn_address is a host name and must be
protected by timeout, too.
The C standard library APIs (getaddrinfo/gethostbyname) do not
implement timeout or non-blocking behavior themselves; so the
only way to stop them is to interrupt them via signal as
Instead, we may use lib/resolv.rb from the standard library
which is pure Ruby and does not depend on interrupts and
implements its own timeout using IO.select. However, resolv.rb
does not automatically use nscd for caching under glibc like
getaddrinfo does. There may be other incompatibilities, too.
I have not looked at C-ARES or other similar async name resolution
libraries, and it may not be acceptable to depend on non-stdlib.
glibc provides getaddrinfo_a for asynchronous resolution, but
not all the world is glibc and the API is bad:
it relies on signals which may conflict with existing code
or it spawns a new thread which is inefficient
So, the better option may be to fix Timeout.timeout somehow;
possibly exposing a C API which works safely with the VM and
knows only to interrupt at pre-defined "timeout points"
(similar to cancellation points in pthreads, perhaps using
an existing timer_thread).