Bug #14011
closedAppveyor failure - svn 60172 - ‘webrick: do not hang acceptor on slow TLS connections’
Description
I noticed the recent Appveyor failure from ‘webrick: do not hang acceptor on slow TLS connections’ svn 60172.
Coincidentally, I recently tried a MinGW build with the gem puma using SSL. It builds fine with ruby 2.2 thru trunk, but also has several Appveyor failures similar to the above, as all involve SSL servers and nonblock connections.
I’m happy to help with testing, but I suspect this is beyond my current knowledge in terms of a fix…
FYI, I did run up puma locally with a static rack, http, and trunk; everything works fine. I did not write code to connect it with multiple concurrent clients (I’ve done that before back when I was coding in C#). I also understand that some things may ‘just not work’ on windows.
Thanks again, Greg
Files
Updated by normalperson (Eric Wong) about 7 years ago
Greg.mpls@gmail.com wrote:
Maybe usa can help...
Or maybe sprinkle `warn "#{LINE} ...\n"' in the test code and
webrick source to see where it stops?
I noticed the recent Appveyor failure from ‘webrick: do not hang acceptor on slow TLS connections’ svn 60172.
Is that appveyor link supposed to show test output? I don't use
JavaScript or GUIs. Is there plain text or simple HTML log output?
I’m happy to help with testing, but I suspect this is beyond
my current knowledge in terms of a fix…
Mainly, it is breaking down steps and putting warn/fprintf
statements everywhere to figure out what's running or stuck.
There's probably platform-specific tracing tools you can use,
too.
FYI, I did run up puma locally with a static rack, http, and
Did you mean https?
trunk; everything works fine. I did not write code to connect
it with multiple concurrent clients (I’ve done that before
back when I was coding in C#). I also understand that some
things may ‘just not work’ on windows.
If you meant "https" above, try only making one TCP connection
without the SSL/TLS handshake; then sleep the process:
c = TCPSocket.new(host, port)
# no OpenSSL stuff
sleep # forever
Then, try to make normal HTTPS connections from a different
terminal or browser and see if it works or gets stuck.
Updated by MSP-Greg (Greg L) about 7 years ago
- File trunk_mingw_puma.txt trunk_mingw_puma.txt added
- File trunk_mswin_webrick.txt trunk_mswin_webrick.txt added
Eric,
Thanks for the quick response. I've attached the two logs. I really hate the posts where people have a gazillion lines of a seg fault report...
Re this issue, it seems that it (mostly) involves code in ruby openssl or net/*, not code that's part of either webrick or puma.
I did mean http. IOW, puma set as a non ssl server works fine run on a mingw trunk build with a Chrome client. When I try to change it over to ssl (set verify_mode to none on client and server, provide key and cert files, etc), the client I set up in code can't connect. Similar 'host disconnected' errors.
Hope this helps. As to debugging, as it seems that the server is failing/closing, I haven't looked for how I can trap that. Reminder, I'm not a c type...
Thanks, Greg
Updated by normalperson (Eric Wong) about 7 years ago
Greg.mpls@gmail.com wrote:
Re this issue, it seems that it (mostly) involves code in ruby
openssl ot net/*, not code that's part of either webrick or
puma.
OK, so the puma failures are independent of my webrick change
(r60172), correct? Also, keep in mind puma uses it's own
"minissl" C extension which could introduce more variables.
But the ECONNRESET failure in
TestNetHTTPS#test_certificate_verify_failure is new and probably
the result of my webrick change?
Does it still happen if you remove the WEBRick::Utils.timeout
call I added and just call `sock.accept'?
I did mean http. IOW, puma set as a non ssl server works fine
run on a mingw trunk build with a Chrome client. When I try
to change it over to ssl (set verify_mode to none on client
and server, provide key and cert files, etc), the client I set
up in code can't connect. Similar 'host disconnected' errors.
Weird, I was about to suggest multi-threading problems with the
client and server sharing process memory; but Chrome and Ruby
are already different processes...
Hope this helps. As to debugging, as it seems that the server
is failing/closing, I haven't looked for how I can trap that.
Reminder, I'm not a c type...
Mainly, it's down to being stubborn and persistent :>
Nobody starts off knowing C. I started out putting *printf
statements in various places and editing code the same way I'd
edit a config file until I got the results or info I wanted.
But I don't even think you'd have to deal with C, since I
only changed the Ruby code for webrick.
Hopefully this gets figured out soon or usa or someone else
can chime in. I'll be offline for a few days soonish.
Updated by MSP-Greg (Greg L) about 7 years ago
normalperson (Eric Wong) wrote:
OK, so the puma failures are independent of my webrick change
(r60172), correct? Also, keep in mind puma uses it's own
"minissl" C extension which could introduce more variables.
Yes. They happened before your commit. I mentioned them just because 'nonblock' in ruby std-lib code seems common to both...
But the ECONNRESET failure in
TestNetHTTPS#test_certificate_verify_failure is new and probably
the result of my webrick change?
Yes. It's failing both on the Appveyor mswin build and my local MinGW build.
Does it still happen if you remove the WEBRick::Utils.timeout
call I added and just call `sock.accept'?
I've think you're referring to:
if sock.respond_to?(:sync_close=) && @config[:SSLStartImmediately]
WEBrick::Utils.timeout(@config[:RequestTimeout]) do
sock.accept # OpenSSL::SSL::SSLSocket#accept
end
end
Commented out 2 & 4 (start from 1), still have the failure, then commented out all but 3, still the failure.
Hopefully this gets figured out soon or usa or someone else
can chime in. I'll be offline for a few days soonish.
Offline is good sometimes... Thanks.
Updated by MSP-Greg (Greg L) about 7 years ago
From reading normalperson's (Eric Wong's) description in 60172, it sounds like this change may have shown an issue involving 'per-client threading' on Windows builds.
From my testing with Puma, this issue seems to exist in 2.3 and 2.4 also. Hence, if a fix is possible, please consider backporting to those versions.
Updated by MSP-Greg (Greg L) about 7 years ago
- ruby -v changed from ruby 2.5.0dev (2017-10-12 trunk 60170) [x64-mingw32] to ruby 2.5.0dev (2017-10-13 trunk 60172) [x64-mingw32]
My rolling Appveyor trunk build completed. It had the same failure as the mswin builds currently have. So the next builds pass, I may reverse this until it's resolved.
Summary:
1 Total Failures/Errors Build No 44 Job Id e7s5dw24gghj56hb
ruby 2.5.0dev (2017-10-13 trunk 60175) [x64-mingw32]
2017-10-13 03:31:29 UTC
test-all 16954 tests, 2236388 assertions, 1 failures, 0 errors, 117 skips, 85 skips shown
test-spec 3554 files, 26057 examples, 203541 expectations, 0 failures, 0 errors, 0 tagged
mspec 3554 files, 26057 examples, 203471 expectations, 0 failures, 0 errors, 0 tagged
test-basic test succeeded
btest PASS all 1194 tests
Failure:
86) Failure:
TestNetHTTPS#test_certificate_verify_failure [C:/projects/ruby-loco/src/ruby/test/net/http/utils.rb:48]:
<[]> expected but was
<["[2017-10-13 03:22:01] ERROR Errno::ECONNRESET: An existing connection was forcibly closed by the remote host. - SSL_accept\n" +
"\tC:/projects/ruby-loco/src/ruby/lib/webrick/server.rb:298:in `accept'\n" +
"\tC:/projects/ruby-loco/src/ruby/lib/webrick/server.rb:298:in `block (2 levels) in start_thread'\n" +
"\tC:/projects/ruby-loco/src/ruby/lib/webrick/utils.rb:263:in `timeout'\n" +
"\tC:/projects/ruby-loco/src/ruby/lib/webrick/server.rb:297:in `block in start_thread'\n"]>.
Thanks, Greg
Updated by nagachika (Tomoyuki Chikanaga) about 7 years ago
- Related to Bug #14005: [PATCH] webrick: do not hang acceptor on slow TLS connections added
Updated by hsbt (Hiroshi SHIBATA) about 7 years ago
- Status changed from Open to Assigned
- Assignee set to normalperson (Eric Wong)
Updated by MSP-Greg (Greg L) about 7 years ago
Hiroshi and Eric - I apologize for allowing this issue to remain open. I cannot find any indication that this is a problem with any current builds.
Okay to close.
Updated by hsbt (Hiroshi SHIBATA) about 7 years ago
- Status changed from Assigned to Rejected