Project

General

Profile

Actions

Bug #11692

closed

[PATCH] Re-enable GC if stack overflow was caught from signal handler

Added by gazay (Alex Gaziev) about 9 years ago. Updated over 8 years ago.

Status:
Closed
Target version:
-
[ruby-core:71497]

Description

We got ruby application running on our production server and noticed that it regularly crashes with out of memory errors.
After months of investigation, I narrowed the case to the examples (1/2).
After digging ruby sources and running test code, I found out that GC stopped working after recovering from native stack overflow error.
Probably the relevant code appeared in 2.2 https://github.com/ruby/ruby/commit/0c391a55d3ed4637e17462d9b9b8aa21e64e2340
where ruby_disable_gc_stress became ruby_disable_gc.

Patches for trunk and 2.2 branches below.


Files

example1.rb (311 Bytes) example1.rb Example with puts gazay (Alex Gaziev), 11/15/2015 05:21 PM
example2.rb (333 Bytes) example2.rb Example with json gazay (Alex Gaziev), 11/15/2015 05:21 PM
re-enable-gc-after-stackoverflow-trunk.patch (985 Bytes) re-enable-gc-after-stackoverflow-trunk.patch Patch for trunk gazay (Alex Gaziev), 11/15/2015 05:21 PM
re-enable-gc-after-stackoverflow-ruby-2-2.patch (985 Bytes) re-enable-gc-after-stackoverflow-ruby-2-2.patch Patch for ruby_2_2 gazay (Alex Gaziev), 11/15/2015 05:21 PM
gc_threads_issue.rb (1.53 KB) gc_threads_issue.rb ebeigarts (Edgars Beigarts), 11/17/2015 07:53 PM
re-enable-GC-if-stack-overflow-was-caught-from.patch (1.5 KB) re-enable-GC-if-stack-overflow-was-caught-from.patch Patch for ruby 2.2.4 gazay (Alex Gaziev), 12/18/2015 10:08 AM

Updated by ebeigarts (Edgars Beigarts) about 9 years ago

I'm having similar issue when running tests with capybara that starts additional server in a new thread.
If I have some problems in my rails app that raises SystemStackError in the server thread then I am left without a GC and memory just keeps growing and growing, I have tried manually calling GC.start after that, but it doesn't help, GC.stat dislays the same number for major/minor.

I was trying to create a simple example, but I run into other issues too.
Here is the example: https://gist.github.com/ebeigarts/933f1601332609ed33a8

Do I need to open a new issue for this?

Updated by gazay (Alex Gaziev) about 9 years ago

Edgars Beigarts wrote:

Do I need to open a new issue for this?

I think it is the same problem. I wrote small explanation what is happening: https://gist.github.com/gazay/54da61919b85eb2e0d42

Actions #5

Updated by ko1 (Koichi Sasada) about 9 years ago

  • Status changed from Open to Closed

Applied in changeset r52668.


  • signal.c: should also clear ruby_disable_gc.
    [Bug #11692]

Updated by ko1 (Koichi Sasada) about 9 years ago

  • Assignee set to ko1 (Koichi Sasada)

Thank you for your great survey!

Updated by ebeigarts (Edgars Beigarts) about 9 years ago

Tried with ruby head, this solves the GC problems now, but it doesn't solve the other problem that is visible if you run my example -
https://gist.github.com/ebeigarts/44648eb7b2773e102335

In ruby 2.2 it looks like if a stack overflow is raised in a thread, the thread just dies. I was running with 2.2.0 and not 2.2.3.

In ruby trunk (and also 2.2.3) it looks like if a stack overflow is raised in a thread 2 times then on the 2nd time the whole process just hangs and the only way to stop is kill -9.

Updated by ko1 (Koichi Sasada) about 9 years ago

Unfortunately, it is known problem (2nd time machine stack overflow we can not capture correctly).

  • 1st machine stack overflow
    • SEGV
    • check machine stack overflow
    • raise an error from signal handler (*1) by longjmp.
  • 2nd machine stack overflow
    • SEGV
    • signal status is signaling. So OS can not deliver signal correctly...

The correct way is restoring signal status using sigsetjmp/siglongjmp at (*1).
However, on Linux 2.x, siglongjmp is too slow than longjmp, so that we continue to use longjmp, at least the last time we had discussed this issue. We can't slow down Ruby interpreter for such a corner case.

However, Linux 2.x is older OS. So that we can change.
(BTW, I'm using Linux 2.6 on several machines)

Updated by gazay (Alex Gaziev) about 9 years ago

Koichi, do you plan to do backport to 2.2 version of this patch?

Updated by gazay (Alex Gaziev) about 9 years ago

As I see, problem wasn't fixed in Ruby 2.2.4. Patch for it in attachment

Updated by vo.x (Vit Ondruch) about 9 years ago

  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: REQUIRED

Backport should be probably requested

Updated by gazay (Alex Gaziev) about 9 years ago

Thank you, didn't know that

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: REQUIRED to 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: DONE

Backported into ruby_2_2 branch at r54340.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0