Bug #7584

Ruby hangs when shutting down an ssl connection in gc finalization

Added by Bob Potter over 1 year ago. Updated about 1 year ago.

[ruby-core:50951]
Status:Closed
Priority:Normal
Assignee:Martin Bosslet
Category:ext
Target version:2.1.0
ruby -v:ruby 1.9.3p327 (2012-11-10) [i686-linux] Backport:

Description

It appears that during garbage collection ruby will sometimes attempt to shutdown an ssl connection that has already been shutdown. At least in certain cases this causes the SSL_shutdown call to block forever.

(gdb) bt
#0 0xb76fa424 in _kernelvsyscall ()
#1 0xb740d17b in read () from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6
#2 0xb6faabae in ?? () from /lib/i386-linux-gnu/libcrypto.so.1.0.0
#3 0xb6fa8412 in BIOread () from /lib/i386-linux-gnu/libcrypto.so.1.0.0
#4 0xb70c3157 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#5 0xb70c42d5 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#6 0xb70c1371 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#7 0xb70dc931 in SSL
shutdown () from /lib/i386-linux-gnu/libssl.so.1.0.0
#8 0xb711ba9d in osslsslshutdown (ssl=0xb21c1f18) at osslssl.c:987
#9 0xb711bb31 in ossl
sslfree (ssl=0xb21c1f18) at osslssl.c:998
#10 0xb753bba5 in runfinal (obj=229474180, objspace=) at gc.c:2932
#11 finalize
list (objspace=0x98805c0, p=0xdad7f84) at gc.c:1946
(gdb) f 9
#9 0xb711bb31 in osslsslfree (ssl=0xb21c1f18) at osslssl.c:998
998 in ossl
ssl.c
(gdb) p (*ssl).shutdown
$8 = 1

I'm not sure exactly what is causing this condition, but it seems like we should be checking if the connection is already shutdown (sslgetshutdown()) before trying to shut it down in osslsslfree.

info.txt Magnifier (8.4 KB) Bob Potter, 03/23/2013 10:32 AM


Related issues

Related to Backport200 - Backport #8240: SSLSocket breaks other connections or files on GC Closed 04/09/2013

History

#1 Updated by Eric Hodel over 1 year ago

  • Category set to ext
  • Assignee set to Martin Bosslet
  • Target version set to 2.0.0

#2 Updated by Usaku NAKAMURA over 1 year ago

  • Status changed from Open to Assigned

#3 Updated by Yusuke Endoh about 1 year ago

I'm not sure how significant this is.
Martin, what do you think?

Yusuke Endoh mame@tsg.ne.jp

#4 Updated by Bob Potter about 1 year ago

For what it's worth, we were able to work around this by explicitly shutting down ssl connections instead of letting it happen during GC.

#5 Updated by Yusuke Endoh about 1 year ago

  • Target version changed from 2.0.0 to 2.1.0

Thank you Bob!

Then I postpone this to 2.1.0.
If there is a good patch available soon, we may backport it to ruby20_0.

Yusuke Endoh mame@tsg.ne.jp

#6 Updated by Bob Potter about 1 year ago

=begin
Digging into this some more because it cropped up in another project.

I've had some luck reproducing this issue with a small script: https://gist.github.com/bpot/5225920

Sometimes it reproduces easily, sometimes it doesn't. I've been testing with "ruby 1.9.3p374". I haven't tested to see if it happens with the latest trunk.

I've attached a file with some gdb logs of the state of the SSL struct and lsof output for the process after it hangs. One oddity is that the SSL connection appears to be for FD 11 but lsof doesn't have output for such a file descriptor. You would expect the hanging read to just receive EBADF in this case, right?

I don't know if this is related to the hanging issue but I've noticed that often during GC the underlying TCPSocket object's finalizer will be called before the SSLSocket's finalizer preventing us from closing down SSL connection correctly:

19:28:56.520556 close(13) = 0
19:28:56.520969 write(13, "\25\3\3\0\26\177\n\232\255{9iWfz4\255\240 \276\254!\222\374\10C{", 27) = -1 EBADF (Bad file descriptor)

Should SSLSocket be setting a custom finalizer on the underlying rbiot?
=end

#7 Updated by Bob Potter about 1 year ago

It looks like it is a race condition where the TCPSocket finalizer closes a fd, that fd is reused by another thread and then we try to do SSL_shutdown on this new connection and bad things happen.

Would it make sense to have the TCPSocket finalizer be a no-op (when it's used for ssl) and have SSLSocket's free operation take care of shutting everything down? Is there a better way to solve this?

#8 Updated by Nobuyoshi Nakada about 1 year ago

The order of finalizers is indeterminate.

You'd be able to leave FDs open with IO#autoclose.

#9 Updated by Martin Bosslet about 1 year ago

Bob, could you please check if the patch applied in https://bugs.ruby-lang.org/issues/8240 fixed this? Thanks!

#10 Updated by Bob Potter about 1 year ago

Looks like that fixes it! I was unable to reproduce with "ruby 2.1.0dev (2013-04-15 trunk 40308) [x86_64-linux]".

#11 Updated by Martin Bosslet about 1 year ago

  • Status changed from Assigned to Closed

Great, thanks for checking! I'll close this, too, then. If anyone still runs into this or a similar issue, please let me know!

Also available in: Atom PDF