Project

General

Profile

Actions

Bug #17781

closed

Resolv::DNS RequestID table allocations are never freed, causing DNS lookups to eventually hang

Added by supermathie (Michael Brown) 8 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
[ruby-core:103280]

Description

In between ruby 2.7.2 and 2.7.3/3.0.0 releases a bug was introduced into Resolv::DNS functionality that eventually will break long-running processes that do DNS lookups.

Resolv::DNS.free_request_id is no longer called once requests are complete. The implication of this is that a maximum of 65536 DNS lookups can be performed against a single DNS server; any further requests will cause ruby to hang in an infinite loop in Resolv::DNS.allocate_request_id (all allocations are taken, however the code will loop forever trying to find one that is unallocated).

Reproduction of the problem is possible with the following code:

require 'resolv'
65536.times { Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A) }
puts "Ran 65536 times"
Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A)
puts "Ran 65537 times" # never printed

The following script can be used to evaluate if a given ruby version has this problem:

#!/usr/bin/env ruby

require 'resolv'

puts "ruby version: #{RUBY_VERSION}"

r = Resolv::DNS.new

100.times { r.getresource('www.example.net', Resolv::DNS::Resource::IN::A).address }
puts "RequestID table size: #{Resolv::DNS::RequestID.values.first&.count or "(empty)"}"

A properly behaving version should report an empty RequestID table. For example:

○ → rvm use ruby-2.7.2; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.2
ruby version: 2.7.2
RequestID table size: (empty)

○ → rvm use ruby-2.7.3; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.3
ruby version: 2.7.3
RequestID table size: 100

○ → rvm use ruby-3.0.0; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.0
ruby version: 3.0.0
RequestID table size: 100

○ → rvm use ruby-3.0.1; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.1
ruby version: 3.0.1
RequestID table size: 100

Example output from rbtrace on 2.7.2:

$ rbtrace -p 42 -m 'allocate_request_id(RequestID)' 'free_request_id(RequestID)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID={}) <0.000094>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{6377=>true}}) <0.000092>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{4537=>true}}) <0.000107>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{64587=>true}}) <0.000100>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000097>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37627=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000045>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37265=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000047>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{39586=>true}}) <0.000050>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{50696=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{35266=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000041>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{42458=>true}}) <0.000043>

Example output from rbtrace on 2.7.3:

$ rbtrace -p 42 -m 'allocate_request_id(RequestID.values.first.count)' 'free_request_id(RequestID.values.first.count)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID.values.first.count=544) <0.000161>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=545) <0.000162>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=546) <0.000136>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=547) <0.000111>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=548) <0.000166>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=549) <0.000140>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=550) <0.000118>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=551) <0.000149>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=552) <0.000113>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=553) <0.000141>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=554) <0.000180>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=555) <0.000174>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=556) <0.000155>
…

not reproducible on:

  • ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux-musl] (ruby:2.7.2-alpine docker image)
  • ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

reproducible on:

  • ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux-musl] (ruby:2.7.3-alpine docker image)
  • ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux]
  • ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
  • ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]

Related issues

Has duplicate Ruby master - Bug #17783: Running sidekiq with ruby 3.0 (possible thread deadlock on resolv request_id allocation and free)ClosedActions

Updated by nagachika (Tomoyuki Chikanaga) 8 months ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED

If the issue was introduced by 70c3a195f39763dccdf9367d0c9b7e815431a41a, ruby-2.6.7 might be affected too. Since the changeset was backported into ruby_2_6 too. https://bugs.ruby-lang.org/issues/17658#note-5

Updated by jeremyevans0 (Jeremy Evans) 8 months ago

  • Assignee set to akr (Akira Tanaka)
  • Status changed from Open to Assigned

This should be fixed by https://github.com/ruby/resolv/pull/9

Actions #3

Updated by jeremyevans0 (Jeremy Evans) 8 months ago

  • Has duplicate Bug #17783: Running sidekiq with ruby 3.0 (possible thread deadlock on resolv request_id allocation and free) added

Updated by bensullivan (Ben Sullivan) 7 months ago

It looks like we also ran into this problem in production with ruby 2.6.7 as well. Following the instructions given, I was able to recreate the issue on our ruby:2.6.7-slim based docker image.

Updated by mame (Yusuke Endoh) 6 months ago

  • Status changed from Assigned to Closed

Updated by nagachika (Tomoyuki Chikanaga) 6 months ago

  • Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE

ruby_3_0 9c0df2e81c22e6e35f3c5d69a070c2a3cf67e320 merged revision(s) 9edc162583a4f685332239f6249745ad9b518cbe.

Updated by usa (Usaku NAKAMURA) 6 months ago

  • Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: DONE, 3.0: DONE

ruby_2_7 87d02eacd26d0b2884016315baf2440d100f177e merged revision(s) 9edc162583a4f685332239f6249745ad9b518cbe.

Updated by usa (Usaku NAKAMURA) 5 months ago

  • Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: DONE, 3.0: DONE to 2.5: DONTNEED, 2.6: DONE, 2.7: DONE, 3.0: DONE

ruby_2_6 r67946 merged revision(s) 9edc1625.

Actions

Also available in: Atom PDF