Project

General

Profile

Actions

Bug #17561

closed

The timeout option for Addrinfo.getaddrinfo is not reliable on Ruby 2.7.2

Added by smcgivern (Sean McGivern) 6 months ago. Updated about 2 hours ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:102163]

Description

#15553 introduced a timeout option for Addrinfo.getaddrinfo, which uses getaddrinfo_a internally. It appears this has since been reverted in the development branch via https://github.com/ruby/ruby/commit/5d8bcc4870601ab1ee05346346f241d4a805aac9 (due to #17220 maybe; I didn't quite follow the discussion).

However, Ruby 2.7.2 is still not reliable, even without forking:

$ ruby -v
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
$ ruby -e "require 'resolv'; 10000.times { |i| p [i, Addrinfo.getaddrinfo('2130706433', 80, nil, :STREAM, timeout: 5)] }" | tail
Traceback (most recent call last):
    3: from -e:1:in `<main>'
    2: from -e:1:in `times'
    1: from -e:1:in `block in <main>'
-e:1:in `getaddrinfo': getaddrinfo_a: All requests done (SocketError)
[1473, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1474, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1475, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1476, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1477, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1478, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1479, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1480, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1481, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]
[1482, [#<Addrinfo: 127.0.0.1:80 TCP (2130706433)>]]

This is on a VirtualBox VM and fails fairly quickly. On a 'real' Linux system, I need to try a few times or bump the number of iterations, but it also fails consistently with consecutive requests. I'm choosing 2130706433 (the decimal representation of 127.0.0.1) as that's what our test suite uses, and that's what failed when I tried to use the timeout option.

On Ruby 3.0.0-dev this does not fail due to the aforementioned revert, but should this also be removed from Ruby 2.7 until it's ready?

Updated by stanhu (Stan Hu) 5 months ago

I agree that this should be removed from Ruby 2.7. In https://gitlab.com/gitlab-org/gitlab/-/issues/11951#note_510756578, we were able to trigger a seg fault by doing this:

require 'socket'

loop do
  Addrinfo.getaddrinfo('localhost', nil, timeout: 0)
rescue SocketError
end

Updated by ko1 (Koichi Sasada) about 2 months ago

confirmed on my machine:

$ ruby -ve "require 'resolv'; 10000.times { |i| p [i, Addrinfo.getaddrinfo('2130706433', 80, nil, :STREAM, timeout: 0)] }"
ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux]
*** stack smashing detected ***: terminated
Aborted

Updated by mame (Yusuke Endoh) about 2 months ago

  • Assignee set to Glass_saga (Masaki Matsushita)
  • Status changed from Open to Assigned

Glass_saga (Masaki Matsushita) Could you handle this?

Updated by Glass_saga (Masaki Matsushita) about 2 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED, 3.0: DONTNEED

Although I can't reproduce the issue on my machine (I tried several times), I agree that this should be removed from 2.7.
I will create a patch.

Updated by usa (Usaku NAKAMURA) about 1 month ago

About 2.7, to make raise ArgumentError (or other exception) is not acceptable because it's *stable branch.
I strongly hope to fix this problem at master branch and the fix can be backportable, but at this time, we should simply ignore timeout: argument and note it on documentation.

Updated by jeremyevans0 (Jeremy Evans) 30 days ago

usa (Usaku NAKAMURA) wrote in #note-5:

About 2.7, to make raise ArgumentError (or other exception) is not acceptable because it's *stable branch.
I strongly hope to fix this problem at master branch and the fix can be backportable, but at this time, we should simply ignore timeout: argument and note it on documentation.

I sent a pull request to disable the timeout option: https://github.com/ruby/ruby/pull/4618

I couldn't see where the timeout option was documented, so I didn't modify the documentation.

I don't think we'll be able to backport a fix to the issue. The use of getaddrinfo_a was completely removed in 5d8bcc4870601ab1ee05346346f241d4a805aac9.

Updated by usa (Usaku NAKAMURA) about 7 hours ago

  • Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED, 3.0: DONTNEED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE, 3.0: DONTNEED

merged into ruby_2_7 at d9ff8b3e86a03499a5c6bc36fae1592914a25b9c

Actions #8

Updated by jeremyevans0 (Jeremy Evans) about 2 hours ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF