Project

General

Profile

Actions

Bug #20208

closed

Net::HTTP errors with Errno::EAFNOSUPPORT when setting local_host with Addrinfo

Added by jprokop (Jarek Prokop) about 1 month ago. Updated 29 days ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
[ruby-core:116399]

Description

A bug was found when dealing with Ruby tests downstream. One of our builders has a specific networking configuration, resulting in Ruby incorrectly binding a socket, resulting in exception Errno::EAFNOSUPPORT,
despite localhost being IPv6 capable.

It is reproducible with Ruby 3.3, and reasonably current master (git hash a846d391d38b34fcc4f90adef967c166c923bd56).

Reproduction environment:
The networking configuration has to be in a specific state. The regular interface (such as eth0) has to have ipv6 disabled while localhost is IPv6 enabled.

I have tracked the problem to a commit adding AI_ADDRCONFIG flag: https://github.com/ruby/ruby/commit/d2ba8ea54a4089959afdeecdd963e3c4ff391748#diff-0a5f5e9afd3efff0444a367dd88aac41bb4de9765c8542b81c1ebcff60ab3b14R99
If I revert the commit or just simply set 2 ifdefs that are present in the diff with HAVE_CONST_AI_ADDRCONFIG to 0, the problem no longer occurs.

I have used vagrant with fedora/39-cloud-base box with the above mentioned git hash. However, I'd note that I reproduced it also on RHEL 8 and RHEL 9.

The VM has the following interfaces:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:e3:aa:c1 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet 192.168.122.209/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
       valid_lft 2099sec preferred_lft 2099sec
    inet6 fe80::f5fe:e8a4:8f83:4a8f/64 scope link tentative noprefixroute
       valid_lft forever preferred_lft forever

Disable IPv6 of eth0 and leave only lo with IPv6:

$ sudo sysctl "net.ipv6.conf.eth0.disable_ipv6=1"

Confirm the result:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:e3:aa:c1 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet 192.168.122.209/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
       valid_lft 3587sec preferred_lft 3587sec

inet6 is no longer present on eth0, but still present in lo.

Then we can copy what TestNetHTTPLocalBind is doing in setup, as that is one of the failing tests and use it for a reproducer:

$ ruby -rnet/http -e 'http = Net::HTTP.new("localhost", 8080); http.local_host = Addrinfo.tcp("localhost", 8080).ip_address; p http.get("/")'
/usr/share/ruby/net/http.rb:1603:in `initialize': Failed to open TCP connection to localhost:8080 (Address family not supported by protocol - bind(2) for "::1" port ) (Errno::EAFNOSUPPORT)
	from /usr/share/ruby/net/http.rb:1603:in `open'
	from /usr/share/ruby/net/http.rb:1603:in `block in connect'
	from /usr/share/ruby/timeout.rb:186:in `block in timeout'
	from /usr/share/ruby/timeout.rb:193:in `timeout'
	from /usr/share/ruby/net/http.rb:1601:in `connect'
	from /usr/share/ruby/net/http.rb:1580:in `do_start'
	from /usr/share/ruby/net/http.rb:1569:in `start'
	from /usr/share/ruby/net/http.rb:2297:in `request'
	from /usr/share/ruby/net/http.rb:1917:in `get'
	from -e:1:in `<main>'
/usr/share/ruby/net/http.rb:1603:in `initialize': Address family not supported by protocol - bind(2) for "::1" port  (Errno::EAFNOSUPPORT)
	from /usr/share/ruby/net/http.rb:1603:in `open'
	from /usr/share/ruby/net/http.rb:1603:in `block in connect'
	from /usr/share/ruby/timeout.rb:186:in `block in timeout'
	from /usr/share/ruby/timeout.rb:193:in `timeout'
	from /usr/share/ruby/net/http.rb:1601:in `connect'
	from /usr/share/ruby/net/http.rb:1580:in `do_start'
	from /usr/share/ruby/net/http.rb:1569:in `start'
	from /usr/share/ruby/net/http.rb:2297:in `request'
	from /usr/share/ruby/net/http.rb:1917:in `get'
	from -e:1:in `<main>'

The script:

http = Net::HTTP.new("localhost", 8080)
http.local_host = Addrinfo.tcp("localhost", 8080).ip_address

p http.get("/")

Without setting the http.local_host attribute using Addrinfo, the reproducer does not fail with EAFNOSUPPORT. Whether port is specified or nil does not make a difference.
Whether there is a server listening on 8080 or not does not make a difference, the script fails with the errno regardless.

I have collected strace that points to a possible cause:

$ strace ruby -rnet/http -e 'http = Net::HTTP.new("localhost", 8080); http.local_host = Addrinfo.tcp("localhost", 8080).ip_address; p http.get("/")' 2>&1 | grep AF_INET
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_TCP) = 5
bind(5, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = -1 EAFNOSUPPORT (Address family not supported by protocol)

A socket is created with AF_INET and later is bound with AF_INET6, that is not correct behavior as far as I can tell.
Full strace is attached.

Observed failures in Ruby test suite related to this issue:

109) Error:
TestNetHTTPLocalBind#test_bind_to_local_port:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37337 (Address family not supported by protocol - bind(2) for "::1" port 45395)
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1282:in `test_bind_to_local_port'
110) Error:
TestNetHTTPLocalBind#test_bind_to_local_host:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:46329 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1267:in `test_bind_to_local_host'
111) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_false:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:41749 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1312:in `test_response_body_encoding_false'
112) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_string_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:42775 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1330:in `test_response_body_encoding_string_without_content_type'
113) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_true_with_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:36895 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1324:in `test_response_body_encoding_true_with_content_type'
114) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_encoding_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37115 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1336:in `test_response_body_encoding_encoding_without_content_type'
115) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_true_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37799 (Address family not supported by protocol - bind(2) for "::1" port )
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
    /builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
    /builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1318:in `test_response_body_encoding_true_without_content_type'

Related failures from specs:

1)
An exception occurred during: before :each
TCPSocket#local_address using IPv6 using an implicit hostname the returned Addrinfo uses the correct IP address ERROR
Errno::ECONNREFUSED: Connection refused - connect(2) for nil port 37121
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in `new'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in `block (4 levels) in <top (required)>'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:4:in `<top (required)>'
2)
An exception occurred during: before :each
TCPSocket#remote_address using IPv6 using an implicit hostname the returned Addrinfo uses the correct IP address ERROR
Errno::ECONNREFUSED: Connection refused - connect(2) for nil port 39823
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in `new'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in `block (4 levels) in <top (required)>'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:4:in `<top (required)>'

Files

strace_log.txt (304 KB) strace_log.txt jprokop (Jarek Prokop), 01/24/2024 09:49 AM

Updated by mame (Yusuke Endoh) about 1 month ago

  • Status changed from Open to Assigned
  • Assignee set to kjtsanaktsidis (KJ Tsanaktsidis)

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 month ago

Thanks for this report - it was super detailed and made it very easy for me to figure out what's going on!

Firstly, your bisection is right. The AI_ADDRCONFIG flag is what makes the difference here. The flag causes glibc to NOT return ipv6 addresses if the system doesn't have any ipv6 addresses of its own - and the loopback device doesn't count, glibc will ignore that when asking "does the system have ipv6 addresses?". This is normally what you want when using the result of getaddrinfo for an outbound connection; if you don't have an ipv6 connection to the world, perfoming AAAA DNS lookups which will return results you can't possibly use is pointless and AI_ADDRCONFIG skips this.

By default, Ruby will use AI_ADDRCONFIG for DNS lookups it performs internally as a result of connecting to things; so TCPSocket.new, etc perform their DNS lookups with AI_ADDRCONFIG (since it knows the point of this lookup is to make a connection with it), but other functions like Addrinfo.getaddrinfo by default are not made with this flag, since you might be using the results to do something other than connect to them - maybe you're writing dig in ruby, for example.

The problem with your reproduction is that you are actually trying to connect to localhost; so, your loopback ipv6 address is actually relevant here!

Now, on to your reproduction:

http = Net::HTTP.new("localhost", 8080)

This is going to end up calling into TCPSocket.open, which will perform DNS resolution with AI_ADDRCONFIG. Since your system has no non-loopback IPv6 addresses, this means that '127.0.0.1' gets returned. Whether or not AI_ADDRCONFIG should return IPv6 results for localhost if the loopback adapter has an IPv6 address is an interesting question, but the current implementation in glibc is that it does not:

irb(main):010:0> Addrinfo.getaddrinfo("localhost", 8080, nil, :STREAM, nil, Socket::AI_ADDRCONFIG)
=> [#<Addrinfo: 127.0.0.1:8080 TCP (localhost)>, #<Addrinfo: 127.0.0.1:8080 TCP (localhost)>]
irb(main):011:0> system 'ip addr list'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 84:a9:38:35:ea:56 brd ff:ff:ff:ff:ff:ff
3: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a0:e7:0b:22:fc:ea brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.249/24 brd 192.168.2.255 scope global dynamic noprefixroute wlp0s20f3
       valid_lft 83114sec preferred_lft 83114sec

So, because getaddrinfo returned '127.0.0.1', we proceed to create a IPv4 socket for the connection (this is the AF_INET socket you see in the strace output).

Then, the next line of your reproduction:

http.local_host = Addrinfo.tcp("localhost", 8080).ip_address

This is calling getaddrinfo to resolve "localhost" for us to use it as the local side of the connection. Because Ruby does not know what you intend to do with this IP address, it does not make the request with AI_ADDRCONFIG. Thus, you get an IPv6 result returned, since there is an IPv6 addres for localhost!

This results in the call to bind(AF_INET6) in your strace output, and hence the error.


I think the problem here is that the test TestNetHTTPLocalBind#test_bind_to_local_host (and friends) is wrong. It should be perforning the following sequence of actions (in pseudocode):

  • Do remote_addr = getaddrinfo("host to connect to", AF_UNSPEC, AI_ADDRCONFIG)
  • Then, do local_bind_addr = getaddrinfo("localhost", remote_addr.address_family)
  • Then, do socket(remote_addr), bind(local_bind_addr), and connect(remote_addr).

i.e. we should be explicitly specifying the address family when looking up the local address, so that it's the same as the address family we're going to use in remote_address.

However what it's actually doing is

  • Do remote_addr = getaddrinfo("host to connect to", AF_UNSPEC, AI_ADDRCONFIG)
  • Then, do local_bind_addr = getaddrinfo("localhost", AF_UNSPEC)
  • Then, do socket(remote_addr), bind(local_bind_addr), and connect(remote_addr).

So there's no guarnatee that the local_host it looks up is in the same address family as what it's going to connect to.

Fortunately, #local_host= accepts a string, which will be looked up during the connection. So this program does work properly:

http = Net::HTTP.new("localhost", 8080)
http.local_host = "localhost"
p http.get("/")

If it connects to ::1 (for whatever reason), it will use ::1 as the local addr; and if it connects to 127.0.0.1, it will use 127.0.0.1 as the local addr.

So tl;dr: I'm going to fix the tests here, i think the implementation behaviour is correct.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 month ago

I opened https://github.com/ruby/ruby/pull/9698 for the Net::HTTP tests.

The rubyspec tests are a little trickier. Essentially they do this:

['::1', '127.0.0.1'].each do |addr|
  server = TCPServer.new(addr, 0)
  conn = TCPSocket.new(nil, server.local_address.ip_port)
  server.close; conn.close
end

they assert that you can connect to an IPv4-only service or an IPv6-only service on localhost by running TCPSocket.new(nil). However, because of the AI_ADDRCONFIG thing, we don't get ::1 returned from getaddrinfo when connecting, so it will only try and connect to the IPv4 service both times.

I'm unsure what to do about this.

  • We could unset AI_ADDRCONFIG if nil is passed in, but people probably expect the same to work with "localhost" instead of nil.
  • We could see if a loopback address is returned from getaddrinfo, and re-run without AI_ADDRCONFIG if so. This sounds pretty daft though
  • We could revert the AI_ADDRCONFIG change entirely. glibc now has an option to disable AAAA lookups by setting the environment variable RES_OPTIONS=no-aaaa, if this is required in an environment. Maybe if you know you have the issue from https://bugs.ruby-lang.org/issues/19144, this is an acceptable workaround (it didn't exist at the time).

Updated by jprokop (Jarek Prokop) about 1 month ago

I'll address each suggestion from my POV, though this is not my usual area of work. It sure is tricky.

  1. I feel like this might bring more subtle bugs. I think you're right about also expecting that "localhost" should also work. Or expecting that "my_local" that is mapped on the host to the loopback will also work the same.
  2. It does sound a bit daft, but it might be more reasonable implementation than 1).
  3. On our side of Fedora, CentOS, ..., distros, the fix should be present in older glibc packages as a backport [0], so on downstream we should be able to revert the change and not hit that bug even without specifying the env variable. On upstream side? Not sure.

Hmm, perhaps Ruby could be smarter about the AF_* option? Not sure if it would or if it even be reasonable.

Reading the #19144 ticket, it seems the better solution would be imo for Ruby to not have to work around glibc bugs though.

[0] https://gitlab.com/redhat/centos-stream/rpms/glibc/-/blob/c8s/glibc-rh1868106-5.patch?ref_type=heads

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 month ago

Thanks for your thoughts.

I agree ideally Ruby wouldn’t be carrying around hacks to work around glibc bugs that canonical couldn’t be bothered backporting fixes for. That bug was certainly my original motivation for looking at this issue, but the reason I went ahead with getting the change merged into Ruby is that AI_ADDRCONFIG really does make a lot of sense - why make AAAA requests that are going to return results we can’t possibly use!

But the result of making it impossible to connect to an IPv6 only service on localhost really is wrong. For most applications I think I wouldn’t care, but in a programming language implementation people expect that all valid networking setups should work.

@akr (Akira Tanaka) do you have any opinions on what to do here? Since you reviewed the original AI_ADDRCONFIG change. I’m honestly considering reverting, but not sure.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 29 days ago

I discussed this a bit with @ioquatix (Samuel Williams), we've decided to revert the change to make AI_ADDRCONFIG be used in DNS lookups by default. The interaction with services on localhost is just too surprising.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 29 days ago

  • Status changed from Assigned to Closed
  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

OK, I merged https://github.com/ruby/ruby/pull/9790 for the revert, and opened https://github.com/ruby/ruby/pull/9791 to backport to Ruby 3.3.

Sorry for the trouble, and thank you again for the report!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like1