Bug #6246

1.9.3-p125 intermittent segfault

Added by Jodi Showers about 2 years ago. Updated over 1 year ago.

[ruby-core:44084]
Status:Closed
Priority:Normal
Assignee:Narihiro Nakamura
Category:-
Target version:-
ruby -v:ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux] Backport:

Description

I've seen this behavior in 1.9.3p0 and 1.9.3p125

Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.

RAILSENV=newstaging bundle exec rake assets:precompile --trace

command: https://gist.github.com/2254457 (stacktrace and gem list)

backtrace : https://gist.github.com/2286260

Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3

noname (500 Bytes) Anonymous, 04/03/2012 07:53 AM

noname (500 Bytes) Anonymous, 04/04/2012 02:23 AM

noname (500 Bytes) Anonymous, 04/04/2012 11:53 AM

noname (500 Bytes) Anonymous, 04/04/2012 11:53 AM

valgrind_full.txt.zip - valgrind - verbose (781 KB) Jodi Showers, 04/05/2012 12:36 AM

History

#1 Updated by Anonymous about 2 years ago

On Tue, Apr 03, 2012 at 04:22:04AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been reported by jshow (Jodi Showers).


Bug #6246: 1.9.3-p125 intermittent segfault
https://bugs.ruby-lang.org/issues/6246

Author: jshow (Jodi Showers)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]

I've seen this behavior in 1.9.3p0 and 1.9.3p125

Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.

RAILSENV=newstaging bundle exec rake assets:precompile --trace

command: https://gist.github.com/2254457 (stacktrace and gem list)

backtrace : https://gist.github.com/2286260

Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3

Can you either provide the application so that we can try to reproduce
the issue, somehow reduce the list of moving parts, or run valgrind against
your app?

The size of your gem list is daunting, it could be anything in there.

If you need, I can give you the valgrind command to use. :-)

--
Aaron Patterson
http://tenderlovemaking.com/

#2 Updated by Jodi Showers about 2 years ago

yes please - never used valgrind but will learn.

thanks

#3 Updated by Jodi Showers about 2 years ago

Aaron, I ran a valgrind as follows:

valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILSENV=newstaging assets:precompile

the results are here: https://gist.github.com/2288812

Long before the actual startup of the rake task (rails boot) there were a great number of valgrind outputs -

==11285== Use of uninitialised value of size 8
==11285== at 0x4EA3203: gcmarkchildren (gc.c:1867)
==11285== by 0x4EA491C: markcurrentmachinecontext (gc.c:1650)
==11285== by 0x4EA4A3D: gc
marks (gc.c:2474)
==11285== by 0x4EA5DBF: rbnewobj (gc.c:2195)
==11285== by 0x4F46115: str
alloc (string.c:373)
==11285== by 0x4FA0132: vmcall0 (vmeval.c:79)
==11285== by 0x4FA0CC9: rbfuncall (vmeval.c:236)
==11285== by 0x4ED926E: rbobjalloc (object.c:1603)
==11285== by 0x4EDA835: rbobjdup (object.c:315)
==11285== by 0xDE7034D: pathinitialize (pathname.c:41)
==11285== by 0x4FA0132: vm
call0 (vmeval.c:79)
==11285== by 0x4ED931F: rb
classnewinstance (object.c:1636)
==11285==

I can send all of these if you feel they'll help. None of these were outputted after the actual rake task began.

#4 Updated by Anonymous about 2 years ago

On Tue, Apr 03, 2012 at 11:15:14AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron, I ran a valgrind as follows:

valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILSENV=newstaging assets:precompile

Try running valgrind with these flags:

--num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no

Post all of the errors in a gist if you don't mind.

Can you also try your code against edge 1.9.3? Nari-san has fixed a
segv error in that branch. If we're lucky it's the same issue, and we
don't have to debug anymore. ;)

--
Aaron Patterson
http://tenderlovemaking.com/

#5 Updated by Jodi Showers about 2 years ago

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm presently looking for the edge branch - not seeing it. Do you mean head?

thanks

#6 Updated by Anonymous about 2 years ago

On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm presently looking for the edge branch - not seeing it. Do you mean head?

I guess? I mean the ruby19_3 branch:

https://github.com/ruby/ruby/tree/ruby_1_9_3

:-)

--
Aaron Patterson
http://tenderlovemaking.com/

#7 Updated by Anonymous about 2 years ago

On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm thinking that your segv is happening in a subshell (or not the
process on which you're running valgrind).

Are there hours when I can meet you on IRC? It might be faster than
mail exchange here. :-)

--
Aaron Patterson
http://tenderlovemaking.com/

#8 Updated by Jodi Showers about 2 years ago

I'm in the north american eastern time zone

I can meet today if you can

11am eastern is doable for me - just over an hour away

In the meantime I'll start trying ruby-head

much thanks Aaron

#9 Updated by Jodi Showers about 2 years ago

Aaron, I've updated this gist with a run that uses a more direct command line - which I think was hiding some of the valgrind output

valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no rake RAILSENV=newstaging assets:precompile

https://gist.github.com/2296847

still it seems we're not getting much value from valgrind - it's reporting 0 errors from 0 contexts

I'm re-running now with -v and --leak-check=full for more

#10 Updated by Jodi Showers about 2 years ago

attached is a more verbose valgrind

$ valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no -v --leak-check=full rake RAILSENV=newstaging assets:precompile

#11 Updated by Dennis de Reus about 2 years ago

I have a similar problem when creating a new rails (3.2.3) app. It segfaults on the bundle install step. I think this is related, so I won't create a new bugreport.

Ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]

I found this when I created a new rails (3.2.3) project. Ruby gave a segfault when running bundle install. The details are in this gist: https://gist.github.com/2330262. I'm not sure when the issue arises exactly, but I've also had trouble running the tests from active_admin using ruby 1.9.3-p125, going back to 1.9.2 didn't gave any segmentation faults.

I'm sorry for not giving a more detailed bugreport, but I'm not that well versed in ruby yet. I hope that other with the same problem can add to this report.

#12 Updated by Jodi Showers about 2 years ago

Dennis, looking at your ruby trace, it looks like you've run into something else

It appears that you are on osx, and there are some documented (try google) about ssl and ruby - seems that the ssl libs shipped with osx are not compatible with the latest builds of ruby

#13 Updated by Yusuke Endoh almost 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Narihiro Nakamura

Hello,

Perhaps I could reproduce this issue with small example:

class Foo
def Foo.shutdown(*args)
p Bar.inspect.size
end
Bar = (0..300).map do |x|
x = x.tos
ObjectSpace.define
finalizer(x, Foo.method(:shutdown))
x
end
end

This code causes segfault in my system.
The number (300) may vary depending on a system.

This looks like a bug of GC finalizer.
A finalizer invoked at process termination, seems to refer
GC'ed object, which lead to segfault. I guess.

In fact, the "logging" gem, which is included in OP's trace,
seems to use definefinalizer as a substitute for atexit.

https://github.com/TwP/logging/blob/master/lib/logging.rb#L552

Nari, please check it.

Yusuke Endoh mame@tsg.ne.jp

#14 Updated by Jodi Showers almost 2 years ago

Yusuke Endoh,

I have tried your example, but unfortunately couldn't cause a crash. I experimented with various values of as high as 3000.

I would appreciate whatever effort can be applied - I am literally stuck as 125 crashes so often it's unusable.

If I can provide more details I would be happy to do so

OT, but I don't receive any emails from redmine - I'm watching this issue, and have the general mail settings turned on. Hard to follow - I presently come back to visit regularly.

thanks.

#15 Updated by Eric Wong almost 2 years ago

"jshow (Jodi Showers)" jodi@homestars.com wrote:

I would appreciate whatever effort can be applied - I am literally
stuck as 125 crashes so often it's unusable.

As Aaron mentioned, did you get a chance to try the ruby19_3 branch?
Now that Ruby 1.9.3-p194 is out, maybe it's easier to give that a try.
Again, if we're lucky we it's the same issue we encountered with
Nokogiri and fixed in 1.9.3-p194.

OT, but I don't receive any emails from redmine - I'm watching this
issue, and have the general mail settings turned on. Hard to follow -
I presently come back to visit regularly.

I'm not sure if this redmine is configured emails users directly, but
subscribing to the ruby-core list works.

#16 Updated by Yusuke Endoh almost 2 years ago

Hello,

2012/4/24, jshow (Jodi Showers) jodi@homestars.com:

I have tried your example, but unfortunately couldn't cause a crash. I
experimented with various values of as high as 3000.

It is difficult to reproduce a problem about GC with a small code.
Indeed, my example does not cause SEGV on my another system.

Can you try the following patch?
This is an ad-hoc patch that stops finalizer call at exit.
SEGV should disappear if my guess is right.
If it still causes SEGV, my example is another issue.

diff --git a/eval.c b/eval.c
index 4dddcaa..b2358f8 100644
--- a/eval.c
+++ b/eval.c
@@ -98,7 +98,6 @@ rubyfinalize1(void)
{
rubysigfinalize();
GETTHREAD()->errinfo = Qnil;
- rb
gccallfinalizeratexit();
}

void

--
Yusuke Endoh mame@tsg.ne.jp

#17 Updated by Jodi Showers almost 2 years ago

I've been using p194 for about a day now - no crashes yet

I'll notify you here over the next while

if the crashing resumes, I'll try the patch Yusuke.

much thanks to you both.

#18 Updated by Paul Brannan almost 2 years ago

/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(rbgccallfinalizeratexit+0x219) [0x7f9f37246359] gc.c:3044
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(ruby
cleanup+0x1ec) [0x7f9f37233fac] eval.c:101
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(rubyrunnode+0x25) [0x7f9f37234205] eval.c:244

So we're cleaning up, and:

#0 stlookup (table=0x0, key=792, value=0x7fffffffc858) at st.c:330
#1 0x00007ffff7b1629f in search
method (recv=,
mid=792) at vmmethod.c:374
#2 rb
methodentrygetwithoutcache (recv=,
mid=792) at vm_method.c:393

Someone called a method on an object that had already been GC'd.

That someone is in the logging gem:

ObjectSpace.define_finalizer self, Logging.method(:shutdown)

Ruby makes no guarantees about the order in which finalizers are called
at exit. This can be problematic, because the finalizer may need to
perform operations on objects that have already been GC'd.

I have avoided this before in C code by explicitly checking for a GC'd
object. But since the logging gem is ruby (not C), I don't know what it
should do.

IMO, it should have one finalizer per appender, rather than a single
finalizer that cleans up all the appenders. But I'm not sure that would
fix the crash.

Paul

#19 Updated by Jodi Showers almost 2 years ago

Since my last post, I've continued to see the crash

But today I came across this bug report for the logging gem

https://github.com/TwP/logging/issues/30

which may very well be the source

I'll update my experiences here

#20 Updated by Narihiro Nakamura over 1 year ago

  • Status changed from Assigned to Closed

Hi!

I think ruby-1.9.3-p194 fixed this bug.
if you see the crash with ruby-1.9.3-p194, please reopen this ticket.

Thanks.

Also available in: Atom PDF