Bug #4612
closedSegmentation fault in fiber GC mark cycle
Description
=begin
((|Fiber.current|)) can cause segfault on GC cycle when used in threads. Please find attached ruby sample which should help to pinpoint the problem.
The coredump shows the following backtrace:
....
#18
#19 0x000000010004cce5 in mark_locations_array (objspace=0x100838000, x=0x101dc8ab0, n=649) at gc.c:1315
#20 0x000000010004cea6 in gc_mark_locations (objspace=0x100838000, start=0x101dc8ab0, end=0x101dc9f00) at gc.c:1331
#21 0x000000010004f3dc in rb_gc_mark_machine_stack (th=0x1008a1048) at gc.c:2235
#22 0x0000000100177a4f in rb_thread_mark (ptr=0x1008a1048) at vm.c:1683
#23 0x000000010018179a in cont_mark (ptr=0x1008a1000) at cont.c:88
#24 0x0000000100181947 in fiber_mark (ptr=0x1008a1000) at cont.c:168
#25 0x000000010004dbb9 in gc_mark_children (objspace=0x100838000, ptr=4303907200, lev=1) at gc.c:1719
#26 0x000000010004d3fb in gc_mark (objspace=0x100838000, ptr=4303907200, lev=0) at gc.c:1514
#27 0x000000010004d428 in rb_gc_mark (ptr=4303907200) at gc.c:1520
#28 0x000000010017797f in rb_thread_mark (ptr=0x10049fa00) at vm.c:1673
....
It looks like in frame 28 ((|rb_thread_mark|)) is called on terminated thread, which is ok, but then it goes down to fiber's continuation member ((|saved_thread|)) in frame 22. I think saved_thread holds an earlier snapshot (still running) of the same thread that we see in 28 (thread_id are equal), and because its machine stack pointers are stale, ((|rb_gc_mark_machine_stack|)) starts marking inaccessible memory.
One possible quick-fix is to set machine stack pointers to 0 in ((|cont_init()|)), given the original thread will take care of that stuff and free as needed. It cures segfaults, but I wonder if that doesn't break some other code.
I was also wondering why continuation holds a copy of thread struct instead of a pointer to it. It's hard to correctly follow real thread life cycle with ((|saved_thread|)). So can it harm in other cases like the above?
BTW, while following the code around fibers and continuations, I've found another curious thing: ((|fiber_free()|)) calls ((|cont_free(&fib->cont)|)) on its cont member, and ((|cont_free()|)) calls ((|ruby_xfree()|)) on it. Is that ok, given cont was allocated as part of fiber, and don't we need to ((|ruby_xfree|)) the ((|fib|)) itself?
The attached patch is made against ruby_1_9_2 branch; trunk seems to have the segfault behavior too.
=end
Files
Updated by serge_balyuk (Serge Balyuk) over 13 years ago
I'd like to bring attention to the issue again. Can I get some feedback? Maybe I haven't set some important property like target version or category, but unfortunately I can't change those now, given current permissions.
Updated by nagachika (Tomoyuki Chikanaga) over 13 years ago
Hi,
Sorry for late reply, and thank you for your reporting bug :)
I've checked I can reproduce segv and your patch fix the problem.
And make test-all reports no extra error.
I think your patch is reasonable. I'll check in it to trunk.
BTW, while following the code around fibers and continuations,
I've found another curious thing: fiber_free() calls cont_free(&fib->cont)
on its cont member, and cont_free() calls ruby_xfree() on it. Is that ok,
given cont was allocated as part of fiber, and don't we need to ruby_xfree the fib itself?
'cont' is a first member of structure 'rb_fiber_t', and
(void *)(&fib->cont) == (void *)&fib
That's why we don't need (in fact, must not) to call ruby_xfree() to fib itself.
Updated by serge_balyuk (Serge Balyuk) over 13 years ago
Awesome, thank you very much!
And thanks for the explanation on fiber deallocation, now it makes much more sense to me.
Updated by nagachika (Tomoyuki Chikanaga) over 13 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r31577.
Serge, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- cont.c (cont_init): clear macihne_stack_start/end of saved thread to
prevent mark machine stack of GC'ed Thread. root Fiber is not initialized by
fiber_init(). based on a patch by Serge Balyuk [ruby-core:35891] fixes #4612 - test/ruby/test_fiber.rb (test_gc_root_fiber): add test for it.