Project

General

Profile

Actions

Bug #19412

open

Socket starts queueing and not responding after a certain amount of requests

Added by brodock (Gabriel Mazetto) almost 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
[ruby-core:112213]

Description

Here is a an example code:

require 'socket'

PORT = 8080
BACKLOG = 50
LISTEN = '127.0.0.1'

def handle_connection(connection, _addrinfo)
  request = connection.gets

  puts request

  connection.write "HTTP/1.1 200\r\n"
  connection.write "Content-Type: text/html\r\n"
  connection.write "\r\n"
  connection.write "Hello world! Current time is #{Time.now}"

ensure
  connection.close
end

begin
  socket = Socket.new(:INET, :STREAM)
  socket.bind(Addrinfo.tcp(LISTEN, PORT))
  socket.listen(BACKLOG)

  puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN

  loop do
    listening, = IO.select([socket])
    io, = listening
    connection, addrinfo = io.accept
    handle_connection(connection, addrinfo)
  end
ensure
  socket.close
end

This tries to simulate a TCP server that responds as if it was an HTTP server.

The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding.

$ uname -a
Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

$ ab -n 20000 -c 50 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 7883 requests completed

on MacOS Ventura I get around 16K:

$ uname -a
Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64

$ ab -n 20000 -c 50 http://127.0.0.1:8080/     
                                                                                                                            
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16375 requests completed

in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released:

...
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
^Csocketserver.rb:29:in `select': Interrupt
	from socketserver.rb:29:in `block in <main>'
	from socketserver.rb:28:in `loop'
	from socketserver.rb:28:in `<main>'
$ ruby socketserver.rb                                                                                                                                                  
socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE)
	from socketserver.rb:23:in `<main>'

After killing the process it seems no process is holding the port:

lsof -wni tcp:8080

Running the command above does not return anything (it does when the program is still running).

I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed.


Files

clipboard-202303161214-zhq7o.png (1.02 MB) clipboard-202303161214-zhq7o.png brodock (Gabriel Mazetto), 03/16/2023 11:14 AM
clipboard-202303161214-7xx91.png (683 KB) clipboard-202303161214-7xx91.png brodock (Gabriel Mazetto), 03/16/2023 11:14 AM
clipboard-202303161215-5wjbo.png (1.02 MB) clipboard-202303161215-5wjbo.png brodock (Gabriel Mazetto), 03/16/2023 11:15 AM

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

To try to narrow down the issue, does it still occur if you change the main loop section to?:

begin
  socket = TCPServer.new(LISTEN, PORT)
  socket.listen(BACKLOG)

  puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN

  loop do
    handle_connection(socket.accept, nil)
  end
ensure
  socket.close
end

Updated by brodock (Gabriel Mazetto) almost 2 years ago

@jeremyevans0 (Jeremy Evans) yes, it behaves the same regarding the max amount of requests it can handle:

$ ab -n 20000 -c 50 http://127.0.0.1:8080/

This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16438 requests completed

the one difference is that I can bind right after I abort the program, but it takes a while for it to run the requests again, and when it does, it get limitar 16k requests...

$ ab -n 20000 -c 50 http://127.0.0.1:8080/

This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16398 requests completed

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

Using the server given, I cannot reproduce using the following Ruby code:

require 'socket'

PORT = 8080
LISTEN = '127.0.0.1'

20000.times do
  socket = TCPSocket.new(LISTEN, PORT)
  socket.write("HTTP/1.1 GET /\n")
  socket.shutdown(Socket::SHUT_WR)
  socket.read
ensure
  socket&.close
end

Is there a way you can reproduce this using ruby as the client instead of ApacheBench? That would allow for easier testing. Also, does this reproduce when using ApacheBench with -c 1?

Updated by brodock (Gabriel Mazetto) almost 2 years ago

  • ruby -v set to ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]

@jeremyevans0 (Jeremy Evans)

running apache benchmark with -c1 does show some of the behavior. It will go from 0 to somethign around 16000 in 4 seconds and get stuck there, see a 3x run log:

ab -n 20000 -c 1 http://127.0.0.1:8080/                                                                                                         
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16346 requests completed
ab -n 20000 -c 1 http://127.0.0.1:8080/                                                                                                         
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16347 requests completed
ab -n 20000 -c 1 http://127.0.0.1:8080/                                                                                                         
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16349 requests completed

I've also run a version of your client and got to reproduce the error (with similar output):

require 'socket'

PORT = 8080
LISTEN = '127.0.0.1'
NTIMES = 20000

NTIMES.times do |t|
  socket = TCPSocket.new(LISTEN, PORT)
  socket.write("HTTP/1.1 GET /\n")
  socket.shutdown(Socket::SHUT_WR)
  socket.read

  puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0
rescue Exception => e
  puts "Total of #{t-1} requests completed"

  raise e
ensure
  socket&.close
end
ruby --version                                                                                                                                  
Could not locate Gemfile or .bundle/ directory
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]

ruby client.rb                                                                                                                                  
Could not locate Gemfile or .bundle/ directory
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Total of 16344 requests completed
client.rb:8:in `initialize': Operation timed out - connect(2) for "127.0.0.1" port 8080 (Errno::ETIMEDOUT)
	from client.rb:8:in `new'
	from client.rb:8:in `block in <main>'
	from client.rb:7:in `times'
	from client.rb:7:in `<main>'

and with ruby 3.2.1:

rvm use 3.2.1
Using /Users/brodock/.rvm/gems/ruby-3.2.1

ruby --version
Could not locate Gemfile or .bundle/ directory
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]

ruby client.rb                                                                                                                              
Could not locate Gemfile or .bundle/ directory
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Total of 16342 requests completed
client.rb:8:in `initialize': connect(2) for "127.0.0.1" port 8080 (IO::TimeoutError)
	from client.rb:8:in `new'
	from client.rb:8:in `block in <main>'
	from client.rb:7:in `times'
	from client.rb:7:in `<main>'

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

brodock (Gabriel Mazetto) wrote in #note-4:

I've also run a version of your client and got to reproduce the error (with similar output):

require 'socket'

PORT = 8080
LISTEN = '127.0.0.1'
NTIMES = 20000

NTIMES.times do |t|
  socket = TCPSocket.new(LISTEN, PORT)
  socket.write("HTTP/1.1 GET /\n")
  socket.shutdown(Socket::SHUT_WR)
  socket.read

  puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0
rescue Exception => e
  puts "Total of #{t-1} requests completed"

  raise e
ensure
  socket&.close
end

I tried that client program in my environment (OpenBSD/amd64), and was able to run it without failure with both Ruby 3.1 and 3.2. In terms of further testing, it looks like you are using rvm. While it seems unlikely that that is the problem, can you try using a system-provided Ruby? Also, did you test the above client program on Linux (the output shows MacOS)?

Actions #6

Updated by brodock (Gabriel Mazetto) almost 2 years ago

  • Description updated (diff)

Updated by brodock (Gabriel Mazetto) almost 2 years ago

I've went back to my Linux machine and tried to reproduce the problem.

Using Ruby 3.0 (from the package manager), I got to make it crash only using some decent concurrency (100) and running it for longer time:

brodock@pop-os:~/Projetos/ractor$ neofetch 
             /////////////                brodock@pop-os 
         /////////////////////            -------------- 
      ///////*767////////////////         OS: Pop!_OS 22.04 LTS x86_64 
    //////7676767676*//////////////       Kernel: 6.2.0-76060200-generic 
   /////76767//7676767//////////////      Uptime: 13 mins 
  /////767676///*76767///////////////     Packages: 2031 (dpkg), 25 (flatpak) 
 ///////767676///76767.///7676*///////    Shell: bash 5.1.16 
/////////767676//76767///767676////////   Resolution: 3840x2160 
//////////76767676767////76767/////////   DE: GNOME 42.5 
///////////76767676//////7676//////////   WM: Mutter 
////////////,7676,///////767///////////   WM Theme: Pop 
/////////////*7676///////76////////////   Theme: Pop-dark [GTK2/3] 
///////////////7676////////////////////   Icons: Pop [GTK2/3] 
 ///////////////7676///767////////////    Terminal: gnome-terminal 
  //////////////////////'////////////     CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz 
   //////.7676767676767676767,//////      GPU: NVIDIA GeForce GTX 1660 SUPER 
    /////767676767676767676767/////       Memory: 3709MiB / 32002MiB 
      ///////////////////////////
         /////////////////////                                    
             /////////////                                        
ruby --version
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
ab -n 90000 -c 200 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 9000 requests
Completed 18000 requests
Completed 27000 requests
Completed 36000 requests
Completed 45000 requests
Completed 54000 requests
Completed 63000 requests
Completed 72000 requests
Completed 81000 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 89888 requests completed

It seems Linux can handle the issue better than Mac OS, but whatever is breaking is still there.

This is what I got running the server part:

GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0

bug-19412.rb:13:in `write': Broken pipe (Errno::EPIPE)
	from bug-19412.rb:13:in `handle_connection'
	from bug-19412.rb:28:in `block in <main>'
	from bug-19412.rb:27:in `loop'
	from bug-19412.rb:27:in `<main>'

When I just run the client.rb it doesn´t break as there is not enough concurrency, so it never reaches a point where it can exhaust whatever resource is getting exhausted.

Weirdly enough I can´t reproduce the problem when using hey: https://github.com/rakyll/hey

hey -n 180000 -c 300 http://localhost:8080/

Summary:
  Total:	14.8814 secs
  Slowest:	14.5462 secs
  Fastest:	0.0002 secs
  Average:	0.0112 secs
  Requests/sec:	12095.6387
  

Response time histogram:
  0.000 [1]	|
  1.455 [179742]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.909 [36]	|
  4.364 [77]	|
  5.819 [16]	|
  7.273 [88]	|
  8.728 [23]	|
  10.182 [0]	|
  11.637 [0]	|
  13.092 [0]	|
  14.546 [17]	|


Latency distribution:
  10% in 0.0006 secs
  25% in 0.0015 secs
  50% in 0.0019 secs
  75% in 0.0020 secs
  90% in 0.0022 secs
  95% in 0.0023 secs
  99% in 0.0025 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0066 secs, 0.0002 secs, 14.5462 secs
  DNS-lookup:	0.0001 secs, 0.0000 secs, 0.0491 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0025 secs
  resp wait:	0.0045 secs, 0.0000 secs, 13.5324 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0033 secs

Status code distribution:
  [200]	180000 responses

Updated by brodock (Gabriel Mazetto) almost 2 years ago

I've also tried to build something similar to "hey" in ruby and couldn´t make it crash (this is likely because of the GVL, we can´t get real concurrency):

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'thread'
end

require 'socket'
require 'thread/pool'
require 'thread/channel'

PORT = 8080
LISTEN = '127.0.0.1'
NTIMES = 80000
CONCURRENCY = 512

pool = Thread.pool(CONCURRENCY)
channel = Thread.channel

NTIMES.times do |t|
  pool.process do
    socket = TCPSocket.new(LISTEN, PORT)
    socket.write("HTTP/1.1 GET /\n")
    socket.shutdown(Socket::SHUT_WR)
    socket.read

    channel.send([:success])
  rescue Exception => e
    channel.send([:error, e])
  ensure
    socket&.close
  end
end

output_thread = Thread.new {
  success = 0
  errors = []

  while data = channel.receive
    case data[0]
    when :success
      success += 1

      puts "Completed #{success} requests" if success > 0 && (success % (NTIMES / 10) == 0)
    when :error
      errors << data[1]
    when :eof
      puts "Finished with #{errors.size} errors"
      errors.each { puts _1 }
      break
    end
  end
}

pool.shutdown
channel.send([:eof])
output_thread.join

Updated by brodock (Gabriel Mazetto) almost 2 years ago

@jeremyevans0 (Jeremy Evans) I've tested again in MacOS system provided ruby version:

ruby --version                                                                                                                                     
Could not locate Gemfile or .bundle/ directory
ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22]

And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS.

I've tried running the code above to make sure this is not apache/hey thing.

The behavior is that it runs a couple of thousand requests, then stop for 15-20s then runs another batch and stops and continue doing that...

Because of how the code above handles exceptions, I can see then only at the end:

ruby client-pool.rb                                                                                                                                16:54:52
Could not locate Gemfile or .bundle/ directory
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 40000 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished with 310 errors
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Socket is not connected - shutdown(2)
Broken pipe
Broken pipe
Broken pipe
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
Connection reset by peer
Connection reset by peer - connect(2) for "127.0.0.1" port 8080
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer - connect(2) for "127.0.0.1" port 8080
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer
Connection reset by peer
Connection reset by peer
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
Connection reset by peer
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer
Connection reset by peer
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

brodock (Gabriel Mazetto) wrote in #note-9:

@jeremyevans0 (Jeremy Evans) I've tested again in MacOS system provided ruby version:

ruby --version                                                                                                                                     
Could not locate Gemfile or .bundle/ directory
ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22]

And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS.

I've tried running the code above to make sure this is not apache/hey thing.

In your pure Ruby hey-example, you have CONCURRENCY = 512, but your server has BACKLOG = 50. You would want BACKLOG to be higher than CONCURRENCY, otherwise I think at least some failures may be expected. Note that Ruby doesn't deal well with a high number of threads (and 512 threads would definitely be considered high in this case). For the client part, if you want to test concurrency, it would be better to fork/spawn separate processes and not use threads.

I don't have any other ideas about what could be going wrong. You should try to see if either your client or server is leaking file descriptors. You could also try increasing the timeout. Unfortunately, not being able to reproduce the problem, I can only guess as to what the root cause might be.

Updated by brodock (Gabriel Mazetto) almost 2 years ago

@jeremyevans0 (Jeremy Evans) I'm new to debugging C code. Your idea on filedescriptors seems to to be somewhat spot on.

I'm not sure if this is the right way to debug this or not, but I managed to capture "Filesystem activity" through MacOS instrumentor.

As you can see, there is a big gap between a chunk of request it processed which can illustrate what I was observing.

looking at the description of what is happening you can see that the code couldn't get "accept" to return with success after some number of executions.

What is interesting is that I can't see any close calls, only accept and writev which is weird. Could it be that somehow in MacOS we are not issuing close calls and it is just "timing out"?

Updated by brodock (Gabriel Mazetto) almost 2 years ago

when the pause happens it seems the socket is still kept alive, based on netstat:

(there is a huge list of those):

tcp4       0      0  localhost.http-alt     localhost.56818        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56819        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56820        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56821        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56822        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56823        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56825        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56824        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56826        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56827        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56828        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56829        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56830        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56831        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56832        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56833        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56834        TIME_WAIT
tcp4       0      0  localhost.http-alt     localhost.56835        TIME_WAIT

based on https://stackoverflow.com/questions/2187889/how-can-i-make-a-socket-close-immediately-bypassing-the-timeout-period it seems that there is a way to opt for socket reuse which would bypass this.

It still weird that there is such a small limit on MacOS for how many of those we keep waiting before we can't accept anymore

Updated by brodock (Gabriel Mazetto) almost 2 years ago

Some other updates. I've been trying to undestand the TIME_WAIT and why it does behave the way it doesn on MacOS (but not on Linux).

This response got me ideas to try setting linger on:
https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required

require 'socket'

PORT = 8080
BACKLOG = 64
LISTEN = '127.0.0.1'
LINGER = 0

@processed = 0

def handle_connection(connection, _addrinfo)
  request = connection.gets

  connection.write "HTTP/1.1 200\r\n"
  connection.write "Content-Type: text/html\r\n"
  connection.write "\r\n"
  connection.write "Hello world! Current time is #{Time.now}"

  @processed += 1
  puts "#{request} - #{@processed}"

ensure
  connection.setsockopt(Socket::Option.linger(true, LINGER))
  connection.close
end

begin
  socket = TCPServer.new(LISTEN, PORT)
  socket.setsockopt(:SOCKET, :REUSEADDR, true)
  socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
  socket.listen(BACKLOG)

  puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN

  loop do
    handle_connection(socket.accept, nil)
  end
ensure
  socket.close
end

This modified code now does "release" the socket immediately, which allows me to respond to that huge ammount of requests.

My understanding is that if I set linger to 1 that would mean 1 tick/cycle or 1 second depending on the OS (worst case would be 1 second) but setting that on MacOS brings the old behavior back of having TIME_WAIT all over the place.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0