Bug #8483

SEGV under high concurrency

Added by Diego Plentz 11 months ago. Updated 8 months ago.

[ruby-core:55282]
Status:Feedback
Priority:Normal
Assignee:-
Category:core
Target version:-
ruby -v:ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-linux] Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

Follow a few segfaults from /var/log/messages https://gist.github.com/plentz/5701752

I'm using sidekiq at my production servers and the ruby process dies with a segfault constantly(it's dying constantly since the last 2 weeks). I'm using concurrency of 50 with sidekiq, which causes a lot of threads to run.

I'm using ruby-2.0.0p195, but the problem happens with ruby-1.9.3-p392, ruby-1.9.3-p429 and ruby-2.0.0p0 as well. I already rollbacked all our gems, which probably means that the problem is really something with our ruby code causing the problem and not some gem that we use.

Here's what I managed to get using gdb

https://gist.github.com/plentz/5630854
https://gist.github.com/plentz/5632256

I can't find which line of the code is triggering the problem, since right after the segfault, I can't call (gdb) call rb_backtrace() to find the ruby stacktrace(or just don't know how).

If someone give me some directions, I can get more info, since the problem happens very often in our environment.

History

#1 Updated by Tomoyuki Chikanaga 10 months ago

Do you use any gem packages contains extension libraries?
Can you run a process in terminal and get backtrace etc..

#2 Updated by Yui NARUSE 8 months ago

  • Status changed from Open to Feedback
  • Priority changed from High to Normal

Could you provide a reproducible code?

#3 Updated by Diego Plentz 8 months ago

@nagachika Yes, but I really think the problem isn't related to that.
@naruse Not yet. I'm still trying to reproduce the problem.

Right now, I've found some more info. After a increase in server memory(we added 4GBs of ram), we stopped seeing the segfaults and started to see "stack level too deep". All errors we have are listed bellow(curiously, they all point to lines with "raise"s):
- https://github.com/mperham/sidekiq/blob/v2.12.4/lib/sidekiq/processor.rb#L112
- https://github.com/mperham/sidekiq/blob/v2.12.4/lib/sidekiq/middleware/server/logging.rb#L15
- https://github.com/mperham/sidekiq/blob/v2.12.4/lib/sidekiq/middleware/server/retry_jobs.rb#L54
- https://github.com/rails/rails/blob/v3.2.13/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb#L206

#4 Updated by Diego Plentz 8 months ago

We found the problem, was a recursive call that was generating the problem. I think it's really a bug that it segfaults in a recursive call when we had less memory, but I really can't reproduce the error in a more restrict test case, so I think we can close this. Thanks anyway

Also available in: Atom PDF