Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #15362

closed

[PATCH] Avoid GCing dead stack after switching away from a fiber

Added by alanwu (Alan Wu) over 6 years ago. Updated over 6 years ago.

Status:

Closed

Assignee:

ioquatix (Samuel Williams)

Target version:

2.6

ruby -v:

Backport:

2.5: DONE

[ruby-core:90193]

Description

Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses ./configure --disable-fiber-coroutine, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
fiber to keep the GC marking it. saved_ec gets rehydrated with a
stack pointer if/when the fiber runs again.

Files

Download all files

0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB) 0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch		alanwu (Alan Wu), 11/30/2018 08:22 PM
0001-Add-a-test-for-Bug-14561.patch (1.21 KB) 0001-Add-a-test-for-Bug-14561.patch		alanwu (Alan Wu), 11/30/2018 08:22 PM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by alanwu (Alan Wu) over 6 years ago

Description updated (diff)

Actions

Copy link

#2 [ruby-core:90196]

Updated by aselder (Andrew Selder) over 6 years ago

Alan,

You're amazing... We're completely blocked on upgrading by this bug. You're a savior

Thanks,

Andrew

Actions

Copy link

#3 [ruby-core:90199]

Updated by ioquatix (Samuel Williams) over 6 years ago

Assignee set to ioquatix (Samuel Williams)
Target version set to 2.6
Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN to 2.5: REQUIRED

I will take look at it.

Actions

Copy link

#4 [ruby-core:90200]

Updated by ioquatix (Samuel Williams) over 6 years ago

I looked at the patch it seemed okay.

I committed it to trunk r66111

Once we've confirmed it solves the issue, I will backport 2.5 branch, hopefully it can be included 2.5.4 or next patch release.

Actions

Copy link

#5 [ruby-core:90201]

Updated by ioquatix (Samuel Williams) over 6 years ago

After I merged this patch, https://travis-ci.org/ruby/ruby/jobs/462071967 fails. I am testing locally in more detail.

Actions

Copy link

#6 [ruby-core:90202]

Updated by ioquatix (Samuel Williams) over 6 years ago

Okay, it was random failure. That concerns me too. But, it passed now. I'll merge to ruby_2_5 too.

Actions

Copy link

#7 [ruby-core:90203]

Updated by normalperson (Eric Wong) over 6 years ago

samuel@oriontransfer.net wrote:

After I merged this patch,
https://travis-ci.org/ruby/ruby/jobs/462071967 fails. I am
testing locally in more detail.

Alan + Sam: thanks for investigating this.

Perhaps an alternative fix would be to avoid calling
rb_gc_mark_machine_stack() if the owner thread is dead.

I couldn't reproduce the original problem on FreeBSD 11.2
with both FIBER_USE_NATIVE and FIBER_USE_COROUTINE disabled;
so I can't verify my hypothesis
(Didn't bother testing on Linux)

https://bugs.ruby-lang.org/issues/15362#change-75323

Actions

Copy link

#8 [ruby-core:90204]

Updated by ioquatix (Samuel Williams) over 6 years ago

@normalperson (Eric Wong) I am open to ideas. I don't know a lot about how thread and GC interact within Ruby, unfortunately it's not an area I spent much time on. So, if you have better proposal, I suggest you implement it. This change included a spec, so I was happy to merge it even thought I didn't fully understand what is going wrong. Confirmed green build matrix so I am happy to close. If you want to make a change, let me know, otherwise I will close this and it would be backported to 2.5.

Actions

Copy link

#9 [ruby-core:90205]

Updated by ioquatix (Samuel Williams) over 6 years ago

@normalperson (Eric Wong) I thought about it a bit more, perhaps both ideas should be implemented for maximum protection. We could add some notes in the source code too about the issue/invariants.

Actions

Copy link

#10 [ruby-core:90206]

Updated by normalperson (Eric Wong) over 6 years ago

samuel@oriontransfer.net wrote:

@normalperson (Eric Wong) I thought about it a bit more, perhaps both
ideas should be implemented for maximum protection. We could
add some notes in the source code too about the
issue/invariants.

NAK. I'm against belt-and-suspenders fixes because it favors
lack-of-understanding and makes Ruby slower with more overhead.

Looking at Alan's original fix (which you committed), it seems
to make sense. Anyways, my hypothesis is here:

https://80x24.org/spew/20181201063852.30438-1-e@80x24.org/raw

It may allow Alan's simple one-line fix to be reverted.
However, I prefer Alan's one-liner fix since it's smaller
with instructions with no extra branches.

Actions

Copy link

#11 [ruby-core:90207]

Updated by ioquatix (Samuel Williams) over 6 years ago

Status changed from Open to Closed

Okay, I think we are in agreement then.

Actions

Copy link

#12 [ruby-core:90216]

Updated by alanwu (Alan Wu) over 6 years ago

A comment for future maintainers:
Since &fib->cont.saved_ec == ruby_current_execution_context_ptr in the place my patch makes the change, if GC starts below where I set stack_end to NULL, the stack doesn't get marked.
This is not a problem right now since there is no call to any function that could trigger GC (correct me if I'm wrong), but future changes should be careful to not introduce such calls.
I wish there was some way to codify this constraint and have the compiler statically check this for us.

Actions

Copy link

#13 [ruby-core:90534]

Updated by dazuma (Daniel Azuma) over 6 years ago

Did this get backported to ruby_2_5? I don't see a corresponding commit in the github mirror https://github.com/ruby/ruby/commits/ruby_2_5

Actions

Copy link

#14 [ruby-core:90840]

Updated by aselder (Andrew Selder) over 6 years ago

This still looks like it's waiting on a backport to Ruby 2.5. Also, does anyone know when the next release of the Ruby 2.5 branch will be done?

Thanks!

Actions

Copy link

#15

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago

Has duplicate Bug #14561: Consistent 2.5.0 seg fault in GC, related to accessing an enumerator in a thread added

Actions

Copy link

#16 [ruby-core:90998]

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago

Backport changed from 2.5: REQUIRED to 2.5: DONE

ruby_2_5 r66777 merged revision(s) 66111.

Actions

Copy link

#17

Updated by wanabe (_ wanabe) over 6 years ago

Related to Bug #14714: Ruby 2.5.1 Segmentation Fault in GC added

Actions

Copy link

#18

Updated by wanabe (_ wanabe) over 6 years ago

Related to Bug #15375: Crash report for Ruby 2.5.3p105 added

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #15362

[PATCH] Avoid GCing dead stack after switching away from a fiber

Updated by alanwu (Alan Wu) over 6 years ago

Updated by aselder (Andrew Selder) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by normalperson (Eric Wong) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by normalperson (Eric Wong) over 6 years ago

Updated by ioquatix (Samuel Williams) over 6 years ago

Updated by alanwu (Alan Wu) over 6 years ago

Updated by dazuma (Daniel Azuma) over 6 years ago

Updated by aselder (Andrew Selder) over 6 years ago

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago

Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago

Updated by wanabe (_ wanabe) over 6 years ago

Updated by wanabe (_ wanabe) over 6 years ago

Related to Ruby - Bug #14714: Ruby 2.5.1 Segmentation Fault in GC	Closed		Actions
Related to Ruby - Bug #15375: Crash report for Ruby 2.5.3p105	Closed		Actions
Has duplicate Ruby - Bug #14561: Consistent 2.5.0 seg fault in GC, related to accessing an enumerator in a thread	Closed	ioquatix (Samuel Williams)	Actions