Feature #16476: Socket.getaddrinfo cannot be interrupted by Timeout.timeout - Ruby - Ruby Issue Tracking System

Custom queries

Backport 3.2
Backport 3.3
Backport 3.4
bugs: unassigned
DevMeeting
matz
Open issues with attachment
Windows

Actions

Copy link

Feature #16476

closed

Socket.getaddrinfo cannot be interrupted by Timeout.timeout

Added by kirs (Kir Shatrov) over 5 years ago. Updated over 1 year ago.

Status:

Closed

Assignee:

Glass_saga (Masaki Matsushita)

Target version:

[ruby-core:96642]

Description

It seems like the blocking syscall done by Socket.getaddrinfo blocks Ruby VM in a way that Timeout.timeout has no effect.
See reproduction steps in getaddrinfo_interrupt.rb (https://gist.github.com/kirs/00c02ef92e0418578135fe0a6cbd3d7d). This affects all modern Ruby versions, including the latest 2.7.0.

Combined with default 10s resolv timeout on many Linux systems, this can have a very noticeable effect on production Ruby apps being not resilient to slow DNS resolutions, and being unable to fail fast even with Timeout.timeout.

While https://bugs.ruby-lang.org/issues/15553 improves the situation for Addrinfo.getaddrinfo, Socket.getaddrinfo is still blocking the VM and Timeout has no effect.

I'd like to discuss what could be done to make that call non-blocking for threads in Ruby VM.

UPD: looking closer, I can see that Socket.getaddrinfo("www.ruby-lang.org", "http") and Addrinfo.getaddrinfo("www.ruby-lang.org", "http") call non-interruptible getaddrinfo, while Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10) calls getaddrinfo_a, which is interruptible:

# interrupts as expected
Timeout.timeout(1) do
  Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10)
end

I'd maybe suggest that we try to always use getaddrinfo_a when it's available, including in Socket.getaddrinfo. What downsides that would have?
I'd be happy to work on a patch.

Related issues 3 (2 open — 1 closed)

Related to Ruby - Feature #16381: Accept resolv_timeout in Net::HTTP	Open		Actions
Related to Ruby - Feature #17134: Add resolv_timeout to TCPSocket	Open		Actions
Related to Ruby - Feature #19965: Make the name resolution interruptible	Closed	mame (Yusuke Endoh)	Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by kirs (Kir Shatrov) over 5 years ago

Description updated (diff)

Actions

Copy link

Updated by kirs (Kir Shatrov) over 5 years ago

Description updated (diff)

Actions

Copy link

#3 [ruby-core:96704]

Updated by Dan0042 (Daniel DeLorme) over 5 years ago

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Actions

Copy link

#4 [ruby-core:96751]

Updated by kirs (Kir Shatrov) over 5 years ago

Dan0042 (Daniel DeLorme) wrote:

+1

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Thanks for feedback Daniel!

I've put a PR with the suggested fix: https://github.com/ruby/ruby/pull/2827

Actions

Copy link

Updated by kirs (Kir Shatrov) over 5 years ago

File deleted (~~getaddrinfo_interrupt.rb~~)

Actions

Copy link

Updated by kirs (Kir Shatrov) over 5 years ago

Description updated (diff)

Actions

Copy link

#7 [ruby-core:97280]

Updated by mame (Yusuke Endoh) over 5 years ago

Tracker changed from Bug to Feature
Status changed from Open to Assigned
Assignee set to Glass_saga (Masaki Matsushita)
ruby -v deleted (~~ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]~~)
Backport deleted (~~2.5: UNKNOWN, 2.6: UNKNOWN~~)

We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.

Note:

It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().
Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Actions

Copy link

#8 [ruby-core:98604]

Updated by kirs (Kir Shatrov) about 5 years ago

mame (Yusuke Endoh) wrote in #note-7:

We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.

Note:

It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().

Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Thanks for feedback!
I've opened https://github.com/ruby/ruby/pull/3171 with the approach you've described. Please let me know if I miss anything.

Actions

Copy link

#9 [ruby-core:99728]

Updated by Glass_saga (Masaki Matsushita) almost 5 years ago

Status changed from Assigned to Closed
Target version set to 36

merged: https://github.com/ruby/ruby/commit/2038cc6cab6ceeffef3ec3a765c70ae684f829ed

Actions

Copy link

#10

Updated by Glass_saga (Masaki Matsushita) almost 5 years ago

Related to Feature #16381: Accept resolv_timeout in Net::HTTP added

Actions

Copy link

#11

Updated by hsbt (Hiroshi SHIBATA) almost 5 years ago

Target version changed from 36 to 3.0

Actions

Copy link

#12 [ruby-core:101256]

Updated by ioquatix (Samuel Williams) over 4 years ago

I have some feedback:

I wish we don't use strange abbreviations like "resolv_timeout" and use correct English like "resolve_timeout".
I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.
We will introduce asynchronous DNS resolution using a scheduler hook anyway, for Ruby 3: https://github.com/bruno-/ruby/pull/1

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

That being said, I'm not against this PR. I'm just not sure it's worth the effort/complexity.

Actions

Copy link

#13 [ruby-core:101257]

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

ioquatix (Samuel Williams) wrote in #note-12:

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

One way around this is to use a deadline instead of a timeout:

# or something based on Process.clock_gettime
deadline = Time.now + 5

Addrinfo.getaddrinfo("www.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("docs.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("bugs.ruby-lang.org", "http", deadline: deadline)

I think this approach is much better than a Timeout.timeout block. For one, Timeout.timeout requires the use of Thread#raise and results in less deterministic behavior. Additionally, a deadline option can potentially use a different approach than raising an exception, similar to IO#read_nonblock.

Actions

Copy link

#14

Updated by naruse (Yui NARUSE) over 4 years ago

Related to Feature #17134: Add resolv_timeout to TCPSocket added

Actions

Copy link

#15 [ruby-core:101733]

Updated by naruse (Yui NARUSE) over 4 years ago

Status changed from Closed to Open
Target version changed from 3.0 to 3.1

ioquatix (Samuel Williams) wrote in #note-12:

I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.

https://github.com/ruby/ruby/commit/2038cc6cab6ceeffef3ec3a765c70ae684f829ed is reverted because of [Bug #17220].

Since getaddrinfo doesn't provide nonblocking version, I think we need our own implementation of getaddrinfo_a.

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

In my experience timeout is important for web applications to return a response when a DNS resolution is too slow.
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

Actions

Copy link

#16 [ruby-core:101754]

Updated by Dan0042 (Daniel DeLorme) over 4 years ago

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Actions

Copy link

#17 [ruby-core:101771]

Updated by naruse (Yui NARUSE) over 4 years ago

Dan0042 (Daniel DeLorme) wrote in #note-16:

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Both Timeout.timeout and deadline is not the essential problem of this topic. The topic this ticket handles is "getaddrinfo is not interruptable nor timeout ready". This is because getaddrinfo(3) doesn't have timeout, async, nor nonblocking API. This ticket intends to provide the underlying implementation of such feature.

Actions

Copy link

#18

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

Target version changed from 3.1 to 3.2

Actions

Copy link

#19

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago

Target version deleted (~~3.2~~)

Actions

Copy link

#20

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

Related to Feature #19965: Make the name resolution interruptible added

Actions

Copy link

#21 [ruby-core:115225]

Updated by mame (Yusuke Endoh) over 1 year ago

Status changed from Open to Closed

#19965 solved the problem, so I'll close it.

I completely forgot this ticket, but the concept (pthread_create on every call to getaddrinfo(3)) was the same as in #19965. Just for case, #19965 is a bit more elaborate:

It works as before in environments where pthread is not available.
It includes the same hack for not only getaddrinfo(3) but also getnameinfo(3).
pthread_cancel is not used because until recently there was a bug in glibc where pthread_cancel of a pthread that is calling getaddrinfo(3) would cause subsequent name resolution to stick:
It attempts to use pthread_setaffinity_np to reduce speed degradation.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #16476

Socket.getaddrinfo cannot be interrupted by Timeout.timeout

Updated by kirs (Kir Shatrov) over 5 years ago

Updated by kirs (Kir Shatrov) over 5 years ago

Updated by Dan0042 (Daniel DeLorme) over 5 years ago

Updated by kirs (Kir Shatrov) over 5 years ago

Updated by kirs (Kir Shatrov) over 5 years ago

Updated by kirs (Kir Shatrov) over 5 years ago

Updated by mame (Yusuke Endoh) over 5 years ago

Updated by kirs (Kir Shatrov) about 5 years ago

Updated by Glass_saga (Masaki Matsushita) almost 5 years ago

Updated by Glass_saga (Masaki Matsushita) almost 5 years ago

Updated by hsbt (Hiroshi SHIBATA) almost 5 years ago

Updated by ioquatix (Samuel Williams) over 4 years ago

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

Updated by naruse (Yui NARUSE) over 4 years ago

Updated by naruse (Yui NARUSE) over 4 years ago

Updated by Dan0042 (Daniel DeLorme) over 4 years ago

Updated by naruse (Yui NARUSE) over 4 years ago

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

Updated by mame (Yusuke Endoh) over 1 year ago