Bug #8100

Segfault in trunk

Added by Magnus Holm about 1 year ago. Updated 11 months ago.

[ruby-core:53439]
Status:Closed
Priority:Normal
Assignee:Narihiro Nakamura
Category:core
Target version:2.1.0
ruby -v:ruby 2.1.0dev (2013-03-18 trunk 39805) [x86_64-linux] Backport:

Description

=begin
Full backtrace (both VM, C and Ruby) is both attached and available at https://travis-ci.org/rtomayko/tilt/jobs/5479138

I haven't been able to reproduce it (and thus I can't create a reduced test case).

This is the test that fails: https://github.com/rtomayko/tilt/blob/581230cbb3b314e88cf5ec9167a24ebb8acc7a93/test/tilt_compilesite_test.rb#L31

The code in question will do these steps in several threads at the same time:

The method is doing some funky class << self to ensure that it gets evaluated under a proper constant scope). It's also caching the methods, so it won't always define a new method, but might re-use another UnboundMethod from a previous compilation (that might have happened on a different thread).

I know it's not much to go after, but at least the backtrace seems to suggest that the error happend in rbaryfill in array.c.

I've also had another report of segfault in Tilt + Ruby 2.0.0, but I don't have the full backtrace yet: https://github.com/rtomayko/tilt/issues/179. Might this be related?

Let me know if you need more details.
=end

seglog.txt Magnifier (104 KB) Magnus Holm, 03/15/2013 08:58 PM

segfault_spec.tar.gz (3.01 KB) Zachary Scott, 03/18/2013 10:51 AM

seg.txt Magnifier (63.4 KB) Davide D'Agostino, 03/18/2013 04:14 PM

fail.rb Magnifier - Reduced script (604 Bytes) Magnus Holm, 03/22/2013 06:38 PM


Related issues

Related to Backport93 - Backport #8163: Backport r39919 Assigned 03/25/2013
Duplicated by ruby-trunk - Bug #8336: Segfault in :=~ Closed 04/27/2013
Duplicated by ruby-trunk - Bug #8353: segfault with puma-1.6.3 Closed 05/02/2013
Duplicated by ruby-trunk - Bug #8056: Random segmentation faults in Tempfile Closed 03/09/2013

Associated revisions

Revision 39883
Added by Nobuyoshi Nakada about 1 year ago

  • KNOWNBUGS.rb: test for [Bug #8100].

Revision 39894
Added by Yui NARUSE about 1 year ago

Add timeout to infinite loop [Bug #8100]

On FreeBSD, it doesn't SEGV.
http://fbsd.rubyci.org/~chkbuild/ruby-trunk/log/20130323T170203Z.log.html.gz

Revision 39919
Added by nari about 1 year ago

  • proc.c (bm_free): need to clean up the mark flag of a free and unlinked method entry. [Bug #8100]

Revision 39925
Added by Nobuyoshi Nakada about 1 year ago

  • test/ruby/testmethod.rb (testunlinkedmethodentryinmethodobjectbug): move from KNOWNBUGS.rb. [Bug #8100]

History

#1 Updated by Zachary Scott about 1 year ago

  • File segfault_spec.tar.gz added
  • Subject changed from Segfault in ruby-2.0.0p0 to Segfault in trunk
  • Target version set to 2.1.0
  • ruby -v changed from ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux] to ruby 2.1.0dev (2013-03-18 trunk 39805) [x86_64-linux]

I've updated the description of this ticket, because I'm able to reproduce a similar bug. Only similar in that we're using a lot of the same dependencies.

I also went ahead and created (as small as possible) reproducible script. Here's the instructions for reproducing the segfault:

1) git clone git://github.com/zzak/segfaultspec.rb.git
2) bundle install
3) bundle exec rspec segfault
spec.rb
4) repeat #3 until segfault. this may take a few tries

I will also attach an archive of the script.

#2 Updated by Davide D'Agostino about 1 year ago

#3 Updated by Zachary Scott about 1 year ago

Forgot to add a link to the repo on github: https://github.com/zzak/segfault_spec.rb

#4 Updated by Tom Wardrop about 1 year ago

I'm also getting segfaults on Ruby 2.0.0. It seems to be related to threading or forking. Can't quite put my figure on it. All I can say is that I don't get in when running my web app in WEBrick on my Mac, but if running it on my CentOS server with Phusion Passenger using the smart spawn method, I get it all the time, about every 10th request it segfaults. Setting passenger to a conservative spawn method (one request per process) reduces the segfault rate considerably, but they still occur.

Here's a stack overflow thread about it, with a response I left on there with a bit more information about my experiences: http://stackoverflow.com/questions/15315809/segfault-error-in-sinatra-after-upgrading-to-ruby-2-0-beta/15492401#15492401

I also reported this to the Phusion Passenger Google Group before realising it's a problem with ruby 2.0.0: https://groups.google.com/forum/?fromgroups=#!topic/phusion-passenger/iEOE4shl_jE

Here's a log including numerous segfaults from my CentOS server running Phusion Passenger: https://gist.github.com/Wardrop/5179380

Either way, it looks like something common to web applications is causing this, or perhaps web application frameworks are so far the most common cases in which Ruby 2.0.0 is being used.

#5 Updated by Magnus Holm about 1 year ago

I've managed to reduce the script down to 30 lines (with no dependencies) that segfaults in both 2.0.0-p0 and trunk (39875). It doesn't segfault every time though so if it takes more than a few seconds to run it, simply Ctrl-C and try again.

#6 Updated by Magnus Holm about 1 year ago

Here's a backtrace I got in gdb: http://pastie.org/7064676. rbgcmarkunlinkedlivemethodentries seems suspicious and related to what the script does.

#7 Updated by Tom Wardrop about 1 year ago

They've obviously done work on the garbage collector for Ruby 2.0. This is likely a bug introduced as result of that. Good work tracking it down judofyr.

#8 Updated by Magnus Holm about 1 year ago

After working with charliesome we've now found an even simpler test case:

http://eval.in/13339

This always segfaults for me on trunk.

#9 Updated by Charlie Somerville about 1 year ago

=begin
Magnus and I reduced this down to an even simpler2 test case:

loop do
def x
"hello" * 1000
end

method(:x).call

end

http://eval.in/13344
=end

#10 Updated by Motohiro KOSAKI about 1 year ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to Narihiro Nakamura

#11 Updated by Nobuyoshi Nakada about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r39883.
Magnus, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • KNOWNBUGS.rb: test for [Bug #8100].

#12 Updated by Nobuyoshi Nakada about 1 year ago

  • Status changed from Closed to Assigned
  • % Done changed from 100 to 0

#13 Updated by Charlie Somerville about 1 year ago

nobu-san, this will loop forever when the bug is fixed. Perhaps change it to 100_000.times?

#14 Updated by Tom Wardrop about 1 year ago

I'd set it to a duration rather than a set number of iterations. I've see it go for 2 seconds on my machine before segfault'ing. 3 seconds should fail almost every time.

start_time = Time.now
while (Time.now - start_time) < 3
  def x
    "hello" * 1000
  end
  method(:x).call
end

#15 Updated by Nobuyoshi Nakada about 1 year ago

charliesome (Charlie Somerville) wrote:

nobu-san, this will loop forever when the bug is fixed. Perhaps change it to 100_000.times?

Sure, I've forgot it before the commit.

#16 Updated by Yui NARUSE about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r39894.
Magnus, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


Add timeout to infinite loop [Bug #8100]

On FreeBSD, it doesn't SEGV.
http://fbsd.rubyci.org/~chkbuild/ruby-trunk/log/20130323T170203Z.log.html.gz

#17 Updated by Yui NARUSE about 1 year ago

  • Status changed from Closed to Assigned

#18 Updated by Narihiro Nakamura about 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r39919.
Magnus, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • proc.c (bm_free): need to clean up the mark flag of a free and unlinked method entry. [Bug #8100]

#19 Updated by Zachary Scott about 1 year ago

Thank you nari-san and everyone who helped with this.

Should this be backported as well?

#20 Updated by Narihiro Nakamura about 1 year ago

zzak (Zachary Scott) wrote:

Thank you nari-san and everyone who helped with this.

Should this be backported as well?

Yeah, this fix should be backport to 1.9.3 and 2.0.0.

#21 Updated by Tom Wardrop about 1 year ago

Eagerly awaiting the backport. Can someone please leave a comment when it's back-ported to ruby-2.0.0 head?

#22 Updated by Narihiro Nakamura about 1 year ago

wardrop (Tom Wardrop) wrote:

Eagerly awaiting the backport. Can someone please leave a comment when it's back-ported to ruby-2.0.0 head?

The backport request ticket is here.
https://bugs.ruby-lang.org/issues/8163
You might want to watch this ticket for your purpose.

#23 Updated by Tom Wardrop about 1 year ago

Thanks for that. By the way, I've applied the patch to my production server. Write me down as another happy customer :-)

Also available in: Atom PDF