Backport #8080

Segfault in rb_fd_set

Added by Jon Leighton about 1 year ago. Updated about 1 year ago.

[ruby-core:53349]
Status:Closed
Priority:High
Assignee:Usaku NAKAMURA

Description

I have experienced a segfault with Ruby 2 during an IO.select call:

See https://travis-ci.org/jonleighton/spring/jobs/5393025 or https://gist.github.com/jonleighton/5147785 to see the crash output.

I cannot reproduce on a different version of Linux (Fedora). However I was able to reproduce by downloading a VM image of the Travis CI environment and running the code on there (see http://pivotallabs.com/debugging-travis-builds/ for how to do that).

I tried to produce a simple script to reproduce, but without success. I also tried to build Ruby 2 with debugging symbols, but this did not produce the crash. I'm not sure why - perhaps related to compiler optimisations.

I found a workaround for the crash with https://github.com/jonleighton/spring/commit/c8a7afdd3238ef88bffc2c8f56baa21042400e15.


Related issues

Duplicated by ruby-trunk - Bug #6653: 1.9.2/1.9.3 exhibit SEGV with many threads+tcp connections Closed 06/27/2012

Associated revisions

Revision 39985
Added by Usaku NAKAMURA about 1 year ago

merge revision(s) 39772,39773,39774,39775,39777,39779,39781,39783: [Backport #8080]

* configure.in: check struct timeval exist or not.

* include/ruby/missing.h (struct timeval): check HAVE_STRUCT_TIMEVAL
  properly. and don't include sys/time.h if struct timeval exist.

* file.c: include sys/time.h explicitly.

* random.c: ditto.

* thread_pthread.c: ditto.

* time.c: ditto.

* ext/date/date_strftime.c: ditto.

* include/ruby/missing.h (struct timespec): include <sys/time.h>

* include/ruby/missing.h (__syscall): moved to...

* io.c: here. because __syscall() is only used from io.c.

* include/ruby/missing.h: move "#include <sys/type.h>" to ....

* include/ruby/intern.h: here. because it was introduced for
  fixing NFDBITS issue. .

* thread.c: disabled _FORTIFY_SOURCE for avoid to hit glibc bug.
  [Bug #8080] 

* test/ruby/test_io.rb (TestIO#test_io_select_with_many_files):
  test for the above.

* include/ruby/missing.h: removed __linux__. it's unnecessary.

Revision 39995
Added by Usaku NAKAMURA about 1 year ago

  • include/ruby/missing.h: fixed merge mistake of r39985. [Backport #8080]

History

#1 Updated by Eric Wong about 1 year ago

"jonleighton (Jon Leighton)" j@jonathanleighton.com wrote:

I have experienced a segfault with Ruby 2 during an IO.select call:

See https://travis-ci.org/jonleighton/spring/jobs/5393025 or https://gist.github.com/jonleighton/5147785 to see the crash output.

I cannot reproduce on a different version of Linux (Fedora). However I was able to reproduce by downloading a VM image of the Travis CI environment and running the code on there (see http://pivotallabs.com/debugging-travis-builds/ for how to do that).

I tried to produce a simple script to reproduce, but without success. I also tried to build Ruby 2 with debugging symbols, but this did not produce the crash. I'm not sure why - perhaps related to compiler optimisations.

I found a workaround for the crash with https://github.com/jonleighton/spring/commit/c8a7afdd3238ef88bffc2c8f56baa21042400e15.

Looking at your workaround, I think a better one is "watcher.to_io"
method needs to memoize its return value.

I think Ruby expects the return value of obj.to_io to be persistent
for the lifetime of obj.

Making IO.select cache the value of to_io internally might be alright...

#2 Updated by Motohiro KOSAKI about 1 year ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI
  • Target version set to 2.1.0

#3 Updated by Jon Leighton about 1 year ago

normalperson (Eric Wong) wrote:

Looking at your workaround, I think a better one is "watcher.to_io"
method needs to memoize its return value.

I think Ruby expects the return value of obj.to_io to be persistent
for the lifetime of obj.

Making IO.select cache the value of to_io internally might be alright...

It seems that IO.select does cache the value of to_io internally. At least that seems to be the case from https://gist.github.com/jonleighton/5172263.

Running the script prints "to_io" once and then hangs.

#4 Updated by Eric Wong about 1 year ago

"jonleighton (Jon Leighton)" j@jonathanleighton.com wrote:

Issue #8080 has been updated by jonleighton (Jon Leighton).
normalperson (Eric Wong) wrote:

Looking at your workaround, I think a better one is "watcher.to_io"
method needs to memoize its return value.

I think Ruby expects the return value of obj.to_io to be persistent
for the lifetime of obj.

Making IO.select cache the value of to_io internally might be alright...

It seems that IO.select does cache the value of to_io internally. At least that seems to be the case from https://gist.github.com/jonleighton/5172263.

Running the script prints "to_io" once and then hangs.

That's because nothing else is running while IO.select is running.
IO.select does not cache, it accesses once the data once.
See comments below:

---------------------------------8<-------------------------------
class Foo
def toio
puts "to
io"
IO.pipe.first
end
end
n = 0

trap(:USR1) { n += 1 }

# Generate garbage to trigger GC, and then EINTR for select()
Thread.new do
loop do
# make some garbage here
(1..1000000).each { |z| z.to_s.dup }
puts "KILL #{n}"
Process.kill("USR1", $$)
end
end

# This will now return empty arrays
# If you swap Foo.new for $stdin, this will never return (as expected)
p IO.select([Foo.new])
---------------------------------8<-------------------------------

#5 Updated by Motohiro KOSAKI about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r39775.
Jon, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread.c: disabled FORTIFYSOURCE for avoid to hit glibc bug. [Bug #8080]
  • test/ruby/testio.rb (TestIO#testioselectwithmanyfiles): test for the above.

#6 Updated by Motohiro KOSAKI about 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200
  • Category deleted (core)
  • Status changed from Closed to Assigned
  • Assignee changed from Motohiro KOSAKI to Tomoyuki Chikanaga
  • Priority changed from Normal to High
  • Target version deleted (2.1.0)

The root cause is, Linux's select implementation supports >1024 files, but glibc doesn't. glibc doesn't correctly understand linux select(2) spec.
This issue is only happen when FORTIFYSOURCE >= 1. That said, 2.0 and 1.9.x on Ubuntu. (because Ubuntu enable FORTIFY_SOURCE by default).

I think this fix is important for server usecase. Please backport r39772-r39775.

#7 Updated by Motohiro KOSAKI about 1 year ago

The root cause is, Linux's select implementation supports >1024 files, but glibc doesn't. glibc doesn't correctly understand linux select(2) spec.
This issue is only happen when FORTIFYSOURCE >= 1. That said, 2.0 and 1.9.x on Ubuntu. (because Ubuntu enable FORTIFY_SOURCE by default).

I think this fix is important for server usecase. Please backport r39772-r39775.

#8 Updated by Motohiro KOSAKI about 1 year ago

Hi nagachika-sanm

You need backport 39777, 39779, 39781 and 39783 too.

#9 Updated by Tomoyuki Chikanaga about 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r39838.
Jon, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 39772,39773: [Backport #8080]

* configure.in: check struct timeval exist or not.

* include/ruby/missing.h (struct timeval): check HAVE_STRUCT_TIMEVAL
  properly. and don't include sys/time.h if struct timeval exist.

* file.c: include sys/time.h explicitly.

* random.c: ditto.

* thread_pthread.c: ditto.

* time.c: ditto.

* ext/date/date_strftime.c: ditto.

* include/ruby/missing.h (struct timespec): include <sys/time.h>

#10 Updated by Tomoyuki Chikanaga about 1 year ago

r39772-r39775, r39777, r39779, r39781 and r39783 are backported.
I also merged r39160, r39162, r39174, r39198 and r39200 for clean merge of configure.in.

#11 Updated by Motohiro KOSAKI about 1 year ago

  • Project changed from Backport200 to Backport93
  • Status changed from Closed to Assigned
  • Assignee changed from Tomoyuki Chikanaga to Usaku NAKAMURA

#12 Updated by Usaku NAKAMURA about 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r39985.
Jon, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 39772,39773,39774,39775,39777,39779,39781,39783: [Backport #8080]

* configure.in: check struct timeval exist or not.

* include/ruby/missing.h (struct timeval): check HAVE_STRUCT_TIMEVAL
  properly. and don't include sys/time.h if struct timeval exist.

* file.c: include sys/time.h explicitly.

* random.c: ditto.

* thread_pthread.c: ditto.

* time.c: ditto.

* ext/date/date_strftime.c: ditto.

* include/ruby/missing.h (struct timespec): include <sys/time.h>

* include/ruby/missing.h (__syscall): moved to...

* io.c: here. because __syscall() is only used from io.c.

* include/ruby/missing.h: move "#include <sys/type.h>" to ....

* include/ruby/intern.h: here. because it was introduced for
  fixing NFDBITS issue. .

* thread.c: disabled _FORTIFY_SOURCE for avoid to hit glibc bug.
  [Bug #8080] 

* test/ruby/test_io.rb (TestIO#test_io_select_with_many_files):
  test for the above.

* include/ruby/missing.h: removed __linux__. it's unnecessary.

#13 Updated by Kazuhiro NISHIYAMA about 1 year ago

  • Status changed from Closed to Assigned

This backport to ruby19_3 is not enough.
Build failed on Mac.
see https://gist.github.com/hsbt/5263190

#14 Updated by Usaku NAKAMURA about 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r39995.
Jon, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • include/ruby/missing.h: fixed merge mistake of r39985. [Backport #8080]

Also available in: Atom PDF