Bug #7584

Ruby hangs when shutting down an ssl connection in gc finalization

Added by Bob Potter over 3 years ago. Updated about 3 years ago.

ruby -v:
ruby 1.9.3p327 (2012-11-10) [i686-linux]


It appears that during garbage collection ruby will sometimes attempt to shutdown an ssl connection that has already been shutdown. At least in certain cases this causes the SSL_shutdown call to block forever.

(gdb) bt
#0 0xb76fa424 in _kernel_vsyscall ()
#1 0xb740d17b in read () from /lib/i386-linux-gnu/tls/i686/nosegneg/
#2 0xb6faabae in ?? () from /lib/i386-linux-gnu/
#3 0xb6fa8412 in BIO_read () from /lib/i386-linux-gnu/
#4 0xb70c3157 in ?? () from /lib/i386-linux-gnu/
#5 0xb70c42d5 in ?? () from /lib/i386-linux-gnu/
#6 0xb70c1371 in ?? () from /lib/i386-linux-gnu/
#7 0xb70dc931 in SSL_shutdown () from /lib/i386-linux-gnu/
#8 0xb711ba9d in ossl
ssl_shutdown (ssl=0xb21c1f18) at ossl_ssl.c:987
#9 0xb711bb31 in ossl_ssl_free (ssl=0xb21c1f18) at ossl_ssl.c:998
#10 0xb753bba5 in run_final (obj=229474180, objspace=) at gc.c:2932
#11 finalize_list (objspace=0x98805c0, p=0xdad7f84) at gc.c:1946
(gdb) f 9
#9 0xb711bb31 in ossl_ssl_free (ssl=0xb21c1f18) at ossl_ssl.c:998
998 in ossl_ssl.c
(gdb) p (*ssl).shutdown
$8 = 1

I'm not sure exactly what is causing this condition, but it seems like we should be checking if the connection is already shutdown (ssl_get_shutdown()) before trying to shut it down in ossl_ssl_free.

info.txt Magnifier (8.4 KB) Bob Potter, 03/23/2013 10:32 AM

Related issues

Related to Backport200 - Backport #8240: SSLSocket breaks other connections or files on GC Closed 04/09/2013


#1 [ruby-core:50954] Updated by Eric Hodel over 3 years ago

  • Category set to ext
  • Assignee set to Martin Bosslet
  • Target version set to 2.0.0

#2 [ruby-core:51040] Updated by Usaku NAKAMURA over 3 years ago

  • Status changed from Open to Assigned

#3 [ruby-core:52481] Updated by Yusuke Endoh over 3 years ago

I'm not sure how significant this is.
Martin, what do you think?

Yusuke Endoh

#4 [ruby-core:52502] Updated by Bob Potter over 3 years ago

For what it's worth, we were able to work around this by explicitly shutting down ssl connections instead of letting it happen during GC.

#5 [ruby-core:52568] Updated by Yusuke Endoh over 3 years ago

  • Target version changed from 2.0.0 to 2.1.0

Thank you Bob!

Then I postpone this to 2.1.0.
If there is a good patch available soon, we may backport it to ruby_2_0_0.

Yusuke Endoh

#6 [ruby-core:53655] Updated by Bob Potter about 3 years ago

Digging into this some more because it cropped up in another project.

I've had some luck reproducing this issue with a small script:

Sometimes it reproduces easily, sometimes it doesn't. I've been testing with "ruby 1.9.3p374". I haven't tested to see if it happens with the latest trunk.

I've attached a file with some gdb logs of the state of the SSL struct and lsof output for the process after it hangs. One oddity is that the SSL connection appears to be for FD 11 but lsof doesn't have output for such a file descriptor. You would expect the hanging read to just receive EBADF in this case, right?

I don't know if this is related to the hanging issue but I've noticed that often during GC the underlying TCPSocket object's finalizer will be called before the SSLSocket's finalizer preventing us from closing down SSL connection correctly:

19:28:56.520556 close(13) = 0
19:28:56.520969 write(13, "\25\3\3\0\26\177\n\232\255{9iWfz4\255\240 \276\254!\222\374\10C{", 27) = -1 EBADF (Bad file descriptor)

Should SSLSocket be setting a custom finalizer on the underlying rb_io_t?

#7 [ruby-core:53656] Updated by Bob Potter about 3 years ago

It looks like it is a race condition where the TCPSocket finalizer closes a fd, that fd is reused by another thread and then we try to do SSL_shutdown on this new connection and bad things happen.

Would it make sense to have the TCPSocket finalizer be a no-op (when it's used for ssl) and have SSLSocket's free operation take care of shutting everything down? Is there a better way to solve this?

#8 [ruby-core:53673] Updated by Nobuyoshi Nakada about 3 years ago

The order of finalizers is indeterminate.

You'd be able to leave FDs open with IO#autoclose.

#9 [ruby-core:54275] Updated by Martin Bosslet about 3 years ago

Bob, could you please check if the patch applied in fixed this? Thanks!

#10 [ruby-core:54298] Updated by Bob Potter about 3 years ago

Looks like that fixes it! I was unable to reproduce with "ruby 2.1.0dev (2013-04-15 trunk 40308) [x86_64-linux]".

#11 [ruby-core:54300] Updated by Martin Bosslet about 3 years ago

  • Status changed from Assigned to Closed

Great, thanks for checking! I'll close this, too, then. If anyone still runs into this or a similar issue, please let me know!

Also available in: Atom PDF