Project

General

Profile

Actions

Bug #17781

open

Resolv::DNS RequestID table allocations are never freed, causing DNS lookups to eventually hang

Added by supermathie (Michael Brown) about 1 month ago. Updated 26 days ago.

Status:
Assigned
Priority:
Normal
Target version:
-
ruby -v:
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
[ruby-core:103280]

Description

In between ruby 2.7.2 and 2.7.3/3.0.0 releases a bug was introduced into Resolv::DNS functionality that eventually will break long-running processes that do DNS lookups.

Resolv::DNS.free_request_id is no longer called once requests are complete. The implication of this is that a maximum of 65536 DNS lookups can be performed against a single DNS server; any further requests will cause ruby to hang in an infinite loop in Resolv::DNS.allocate_request_id (all allocations are taken, however the code will loop forever trying to find one that is unallocated).

Reproduction of the problem is possible with the following code:

require 'resolv'
65536.times { Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A) }
puts "Ran 65536 times"
Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A)
puts "Ran 65537 times" # never printed

The following script can be used to evaluate if a given ruby version has this problem:

#!/usr/bin/env ruby

require 'resolv'

puts "ruby version: #{RUBY_VERSION}"

r = Resolv::DNS.new

100.times { r.getresource('www.example.net', Resolv::DNS::Resource::IN::A).address }
puts "RequestID table size: #{Resolv::DNS::RequestID.values.first&.count or "(empty)"}"

A properly behaving version should report an empty RequestID table. For example:

○ → rvm use ruby-2.7.2; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.2
ruby version: 2.7.2
RequestID table size: (empty)

○ → rvm use ruby-2.7.3; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.3
ruby version: 2.7.3
RequestID table size: 100

○ → rvm use ruby-3.0.0; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.0
ruby version: 3.0.0
RequestID table size: 100

○ → rvm use ruby-3.0.1; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.1
ruby version: 3.0.1
RequestID table size: 100

Example output from rbtrace on 2.7.2:

$ rbtrace -p 42 -m 'allocate_request_id(RequestID)' 'free_request_id(RequestID)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID={}) <0.000094>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{6377=>true}}) <0.000092>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{4537=>true}}) <0.000107>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{64587=>true}}) <0.000100>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000097>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37627=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000045>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37265=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000047>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{39586=>true}}) <0.000050>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{50696=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{35266=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000041>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{42458=>true}}) <0.000043>

Example output from rbtrace on 2.7.3:

$ rbtrace -p 42 -m 'allocate_request_id(RequestID.values.first.count)' 'free_request_id(RequestID.values.first.count)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID.values.first.count=544) <0.000161>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=545) <0.000162>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=546) <0.000136>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=547) <0.000111>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=548) <0.000166>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=549) <0.000140>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=550) <0.000118>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=551) <0.000149>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=552) <0.000113>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=553) <0.000141>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=554) <0.000180>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=555) <0.000174>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=556) <0.000155>
…

not reproducible on:

  • ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux-musl] (ruby:2.7.2-alpine docker image)
  • ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

reproducible on:

  • ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux-musl] (ruby:2.7.3-alpine docker image)
  • ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux]
  • ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
  • ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]

Related issues

Has duplicate Ruby master - Bug #17783: Running sidekiq with ruby 3.0 (possible thread deadlock on resolv request_id allocation and free)OpenActions

Updated by nagachika (Tomoyuki Chikanaga) about 1 month ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED

If the issue was introduced by 70c3a195f39763dccdf9367d0c9b7e815431a41a, ruby-2.6.7 might be affected too. Since the changeset was backported into ruby_2_6 too. https://bugs.ruby-lang.org/issues/17658#note-5

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

  • Assignee set to akr (Akira Tanaka)
  • Status changed from Open to Assigned

This should be fixed by https://github.com/ruby/resolv/pull/9

Actions #3

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

  • Has duplicate Bug #17783: Running sidekiq with ruby 3.0 (possible thread deadlock on resolv request_id allocation and free) added

Updated by bensullivan (Ben Sullivan) 26 days ago

It looks like we also ran into this problem in production with ruby 2.6.7 as well. Following the instructions given, I was able to recreate the issue on our ruby:2.6.7-slim based docker image.

Actions

Also available in: Atom PDF