Bug #9525: Stuck with Socket.pack_sockaddr_in - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #9525

closed

Stuck with Socket.pack_sockaddr_in

Bug #9525: Stuck with Socket.pack_sockaddr_in

Added by sonots (Naotoshi Seo) almost 12 years ago. Updated over 9 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

1.9.3p194

Backport:

2.1: DONE

[ruby-core:60801]

Description

We met this trouble with Fluentd https://github.com/fluent/fluentd.

Fluentd is sometimes stuck at Socket.pack_sockaddr_in line on shutdown.
Here is the gist https://gist.github.com/sonots/9047653 to explain details.

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#1 [ruby-core:60802]

Status changed from Open to Feedback

Socket.pack_sockaddr_in can block with a name resolution.
If you think the name resolution doesn't block, please explain why.

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#2 [ruby-core:60803]

Are there any ways to timeout Socket.pack_sockaddr_in?
If so, this problem should be resolved.

EDIT: And, I was pointing an IP address at the concerned line. Will it still block for the name resolution?

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#3 [ruby-core:60804]

I think it is very difficult to add timeout to getaddrinfo() function in C.

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#4 [ruby-core:60805]

I guess getaddrinfo() doesn't block if an IP address is given.

At least, following command on Debian GNU/Linux (jessie) doesn't show anything.

% strace -e socket ruby -rsocket -e 'Socket.pack_sockaddr_in(80, "192.168.1.1")'

It means the command create no socket and no communication to another site.

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#5 [ruby-core:60806]

Following command on CentOS release 6.2 x86_64 returned a line

$ strace -e socket /usr/lib64/fluent/ruby/bin/ruby -rsocket -e 'Socket.pack_sockaddr_in(80, "192.168.1.1")'
socket(PF_NETLINK, SOCK_RAW, 0)         = 5

/usr/lib64/fluent/ruby/bin/ruby -v #=> ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#6 [ruby-core:60807]

I think that PF_NETLINK is not a socket to communicate to another host.

Updated by naruse (Yui NARUSE) almost 12 years ago Actions
Copy link
#7 [ruby-core:60809]

Status changed from Feedback to Third Party's Issue

gdb says __check_pf is guilty.
https://gist.github.com/sonots/9047653#file-gistfile3-txt

(gdb) bt
#0  0x00000032f2ce659d in recvmsg () from /lib64/libc.so.6
#1  0x00000032f2d0c2c5 in make_request () from /lib64/libc.so.6
#2  0x00000032f2d0c6fa in __check_pf () from /lib64/libc.so.6
#3  0x00000032f2ccfb47 in getaddrinfo () from /lib64/libc.so.6
#4  0x00007fb3139d3818 in nogvl_getaddrinfo (arg=<value optimized out>) at raddrinfo.c:161
#5  0x00000000004f600c in rb_thread_blocking_region (func=0x7fb3139d3800 <nogvl_getaddrinfo>, data1=0x7fb0d5dd0140, ubf=<value optimized out>, data2=<value optimized out>) at thread.c:1129
#6  0x00007fb3139d2eaf in rb_getaddrinfo (node=<value optimized out>, service=<value optimized out>, hints=<value optimized out>, res=<value optimized out>) at raddrinfo.c:181
#7  0x00007fb3139d2fcb in rsock_getaddrinfo (host=<value optimized out>, port=22017, hints=0x7fb0d5dd0600, socktype_hack=1) at raddrinfo.c:359
#8  0x00007fb3139d37ed in rsock_addrinfo (host=0, port=140397479001552, socktype=<value optimized out>, flags=659) at raddrinfo.c:379
#9  0x00007fb3139c939a in sock_s_pack_sockaddr_in (self=<value optimized out>, port=140397479001552, host=35158400) at socket.c:1307

By glibc's commits, once __check_pf is always called 6f3914d5a3269c00e70506bd95f816fef6b635ce
(it is fixed at fa3fc0fe5f452d0aa7e435d8f32e992958683819)

The difference between akr's and sonots' seems because of this.

Therefore this is glibc's old bug and RHEL/CentOS's backport issue.

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#8 [ruby-core:60810]

Thank you! Let me write a note here.

On my current environment, the glibc version was glibc-2.12-1.47.el6.x86_64.

I checked the latest centos rpm http://vault.centos.org/6.5/os/Source/SPackages/glibc-2.12-1.132.el6.src.rpm,
but it looked RHED/CentOS is not backporting it yet (cf. https://gist.github.com/sonots/9065060)

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#9 [ruby-core:60829]

I think we can accept a workaround if there is a good patch.

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#10 [ruby-core:60830]

I checked a glibc code. I don't think fa3fc0fe5f fixed this issue. It is a mere optimization patch.
I'm not sure why kernel's netlink doesn't reply anything. But it may be worth to try upgrade your kernel.

Thanks.

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#11 [ruby-core:60831]

Akr-san,

I'm not an expert this area. But I guess we don't need to call getaddrinfo() in this case
because the name was already resolved.
How about bypass to call getaddrinfo when 'host' is given by IP address?

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#12 [ruby-core:60846]

Motohiro KOSAKI wrote:

I'm not an expert this area. But I guess we don't need to call getaddrinfo() in this case
because the name was already resolved.
How about bypass to call getaddrinfo when 'host' is given by IP address?

Yes. It is the workaround I said.

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#13 [ruby-core:60890]

I tried to workaround this issue at r45047.
However I don't have an environment to reproduce the problem.

Would anyone test the problem at latest trunk?

Updated by normalperson (Eric Wong) almost 12 years ago Actions
Copy link
#14 [ruby-core:60891]

akr@fsij.org wrote:

I tried to workaround this issue at r45047.
However I don't have an environment to reproduce the problem.

I don't know how to reproduce the problem, either. netlink sockets
should not block like that.

r45047 looks correct. A few minor comments:

backwards for loop for list is confusing to me,
any reason for not reversing list declaration?
why xmalloc + MEMZERO? xcalloc is shorter and generates smaller code

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#17 [ruby-core:60898]

Status changed from Third Party's Issue to Closed
Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED, 2.1: REQUIRED

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#18 [ruby-core:60907]

Eric Wong wrote:

backwards for loop for list is confusing to me,
any reason for not reversing list declaration?

I prefer ordering consistency between "list" and "res".

why xmalloc + MEMZERO? xcalloc is shorter and generates smaller code

xcalloc is good idea.

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#19 [ruby-core:61276]

Akira Tanaka wrote:

Would anyone test the problem at latest trunk?

I had never ran fluentd with ruby 2.1.1, but now I am trying it.
If it works well, I will try ruby-trunk next.

PS. Sorry, I struggled, but could not create a small subset to reproduce this issue so that anyone can check.
What I can tell here is only that this issue would sometimes occur on an environment which is heavy-loaded, and uses many threads, but I am not sure.

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#20 [ruby-core:61334]

I applied following consecutive patches for ruby 2.1.1, and tried.

https://github.com/ruby/ruby/commit/948ce9decb97e5ff0833e53a392aa9f1d42c9b0d
https://github.com/ruby/ruby/commit/dd1c3a75096b97c1ebcb8597c001761ddfb3c1bf
https://github.com/ruby/ruby/commit/2e6b97a45d077979121b29484a8831034d47ef50

Before (ruby 2.1.1):

The stuck was able to be reproduced with ruby 2.1.1.
I restarted my entire fluentd cluster 12 times, and it occurred at 4th and 12th, that is, 2 times out of 12.

After (ruby 2.1.1 + patch)

I restarted the fluentd cluster 36 times, and got stuck 3 times, but the stuck point was changed.
See https://gist.github.com/sonots/9392668
I did not see that Fluentd was stuck at Socket.pack_sockaddr_in anymore.

I will continue to work for the new stuck point at Fluentd issue https://github.com/fluent/fluentd/pull/257. I think the problem of stuck at Socket.pack_sockaddr_in was resolved.

Updated by usa (Usaku NAKAMURA) over 11 years ago Actions
Copy link
#21 [ruby-core:62241]

memo: related commits: 45046,45047,45063,45087,45150,45151,45152

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#22 [ruby-core:64545]

Is there any progress?

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#23 [ruby-core:64792]

I have confirmed that this issue is already fixed.

https://twitter.com/sonots/status/507804708153483264

Add related commits: r45146

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#24 [ruby-core:64793]

Backport changed from 1.9.3: REQUIRED, 2.0.0: REQUIRED, 2.1: REQUIRED to 1.9.3: REQUIRED, 2.0.0: REQUIRED, 2.1: DONE

r45046, r45047, r45063, r45087, r45146, r45150, r45151 and r45152 were backported into ruby_2_1 branch at r47415.

Updated by mariakatosvich (maria katosvich) over 9 years ago Actions
Copy link
#25 [ruby-core:76359]

Backport deleted (~~1.9.3: REQUIRED, 2.0.0: REQUIRED, 2.1: DONE~~)

Related issue: https://bugs.ruby-lang.org/issues/9536

This problem is a big issue for Fluentd.
We hope to backport this commit to Latest Rubies.

Thanks,
sky contact number uk@fixithere.net

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#26 [ruby-core:76363]

Backport set to 2.1: DONE

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #9525

Stuck with Socket.pack_sockaddr_in

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#1 [ruby-core:60802]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#2 [ruby-core:60803]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#3 [ruby-core:60804]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#4 [ruby-core:60805]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#5 [ruby-core:60806]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#6 [ruby-core:60807]

Updated by naruse (Yui NARUSE) almost 12 years ago Actions
Copy link
#7 [ruby-core:60809]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#8 [ruby-core:60810]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#9 [ruby-core:60829]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#10 [ruby-core:60830]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#11 [ruby-core:60831]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#12 [ruby-core:60846]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#13 [ruby-core:60890]

Updated by normalperson (Eric Wong) almost 12 years ago Actions
Copy link
#14 [ruby-core:60891]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#17 [ruby-core:60898]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#18 [ruby-core:60907]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#19 [ruby-core:61276]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#20 [ruby-core:61334]

Updated by usa (Usaku NAKAMURA) over 11 years ago Actions
Copy link
#21 [ruby-core:62241]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#22 [ruby-core:64545]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#23 [ruby-core:64792]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#24 [ruby-core:64793]

Updated by mariakatosvich (maria katosvich) over 9 years ago Actions
Copy link
#25 [ruby-core:76359]

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#26 [ruby-core:76363]

Project

General

Profile

Ruby

Tags

Custom queries

Bug #9525

Stuck with Socket.pack_sockaddr_in

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #1 [ruby-core:60802]

Updated by sonots (Naotoshi Seo) almost 12 years ago ActionsCopy link #2 [ruby-core:60803]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #3 [ruby-core:60804]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #4 [ruby-core:60805]

Updated by sonots (Naotoshi Seo) almost 12 years ago ActionsCopy link #5 [ruby-core:60806]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #6 [ruby-core:60807]

Updated by naruse (Yui NARUSE) almost 12 years ago ActionsCopy link #7 [ruby-core:60809]

Updated by sonots (Naotoshi Seo) almost 12 years ago ActionsCopy link #8 [ruby-core:60810]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #9 [ruby-core:60829]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago ActionsCopy link #10 [ruby-core:60830]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago ActionsCopy link #11 [ruby-core:60831]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #12 [ruby-core:60846]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #13 [ruby-core:60890]

Updated by normalperson (Eric Wong) almost 12 years ago ActionsCopy link #14 [ruby-core:60891]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago ActionsCopy link #17 [ruby-core:60898]

Updated by akr (Akira Tanaka) almost 12 years ago ActionsCopy link #18 [ruby-core:60907]

Updated by sonots (Naotoshi Seo) almost 12 years ago ActionsCopy link #19 [ruby-core:61276]

Updated by sonots (Naotoshi Seo) almost 12 years ago ActionsCopy link #20 [ruby-core:61334]

Updated by usa (Usaku NAKAMURA) over 11 years ago ActionsCopy link #21 [ruby-core:62241]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago ActionsCopy link #22 [ruby-core:64545]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago ActionsCopy link #23 [ruby-core:64792]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago ActionsCopy link #24 [ruby-core:64793]

Updated by mariakatosvich (maria katosvich) over 9 years ago ActionsCopy link #25 [ruby-core:76359]

Updated by nobu (Nobuyoshi Nakada) over 9 years ago ActionsCopy link #26 [ruby-core:76363]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#1 [ruby-core:60802]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#2 [ruby-core:60803]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#3 [ruby-core:60804]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#4 [ruby-core:60805]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#5 [ruby-core:60806]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#6 [ruby-core:60807]

Updated by naruse (Yui NARUSE) almost 12 years ago Actions
Copy link
#7 [ruby-core:60809]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#8 [ruby-core:60810]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#9 [ruby-core:60829]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#10 [ruby-core:60830]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#11 [ruby-core:60831]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#12 [ruby-core:60846]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#13 [ruby-core:60890]

Updated by normalperson (Eric Wong) almost 12 years ago Actions
Copy link
#14 [ruby-core:60891]

Updated by kosaki (Motohiro KOSAKI) almost 12 years ago Actions
Copy link
#17 [ruby-core:60898]

Updated by akr (Akira Tanaka) almost 12 years ago Actions
Copy link
#18 [ruby-core:60907]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#19 [ruby-core:61276]

Updated by sonots (Naotoshi Seo) almost 12 years ago Actions
Copy link
#20 [ruby-core:61334]

Updated by usa (Usaku NAKAMURA) over 11 years ago Actions
Copy link
#21 [ruby-core:62241]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#22 [ruby-core:64545]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#23 [ruby-core:64792]

Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago Actions
Copy link
#24 [ruby-core:64793]

Updated by mariakatosvich (maria katosvich) over 9 years ago Actions
Copy link
#25 [ruby-core:76359]

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#26 [ruby-core:76363]