Backport #4339

Segmentation fault during Marshal.load

Added by Sean Bradly about 3 years ago. Updated about 2 years ago.

[ruby-core:34955]
Status:Closed
Priority:Normal
Assignee:Shyouhei Urabe

Description

=begin
Conditions that seem to have to be satisfied:

1) A call to Marshal.load must be interrupted by a context switch.
2) Another thread calls GC.start manually.
3) The object being marshaled needs to have a lot of child objects to be reliable.

This appears to be a thread-safety issue affecting Ruby 1.8.x only. It triggers reliably for me on:

Ubuntu -> ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-linux]
Ubuntu-rvm -> ruby 1.8.7 (2010-12-23 patchlevel 330) [i686-linux]
MacRuby -> ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0]
Mac-rvm -> ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin10.4.0]
=end

REPRO4.rb Magnifier - Reproduces the segfault (906 Bytes) Sean Bradly, 01/29/2011 09:09 AM

ruby-1.8.7-trac22417.patch Magnifier - Patch file (3.49 KB) Sean Bradly, 02/01/2011 08:56 AM

ruby-1.8.7-marshal.patch Magnifier - backport of marshall patch from upstream (6.09 KB) Vit Ondruch, 11/15/2011 11:16 PM

Associated revisions

Revision 34867
Added by Nobuyoshi Nakada about 2 years ago

merge revision(s) 25230,34866:

* marshal.c (struct {dump,load}_arg): manage with dfree, instead
  of using local variable which may be moved by context switch.
  

* marshal.c (mark_dump_arg): mark destination string.  patch by
  Vit Ondruch.  [Bug #4339]

* marshal.c (clear_dump_arg, clear_load_arg): clean up also data
  tables as same as symbols tables.

History

#1 Updated by Sean Bradly about 3 years ago

=begin
Including a gdb trace. This issue happens in the markloadarg function when stforeach encounters a corrupt bin ptr. I have seen similar failures occur in marshaldumparg. Also, _very rarely the sttableentry list of one of the bins will become circularly linked, causing an infinite loop.

Program received signal SIGSEGV, Segmentation fault.
0x001c86b6 in stforeach (table=0x3cb3c0, func=0x178f40 <markentry>, arg=0) at st.c:486
486 for(ptr = table->bins[i]; ptr != 0;) {
(gdb) bt
#0 0x001c86b6 in stforeach (table=0x3cb3c0, func=0x178f40 <markentry>, arg=0) at st.c:486
#1 0x001786c3 in marktbl (tbl=0x0) at gc.c:716
#2 rb
marktbl (tbl=0x0) at gc.c:723
#3 0x00189c04 in mark
loadarg (ptr=0xbfffca3c) at marshal.c:841
#4 0x00178d7e in gc
markchildren (ptr=3086829960, lev=1) at gc.c:1025
#5 0x00178a71 in mark
locationsarray (x=, n=) at gc.c:684
#6 0x00157b95 in thread
mark (th=0x8087300) at eval.c:10466
#7 0x00178d7e in gcmarkchildren (ptr=3086830120, lev=3) at gc.c:1025
#8 0x00178d06 in gcmarkchildren (ptr=, lev=) at gc.c:1006
#9 0x00178e18 in gcmarkchildren (ptr=, lev=) at gc.c:1057
#10 0x001791de in garbagecollect () at gc.c:1465
#11 0x00179be7 in rb
gc () at gc.c:1530
#12 0x00179c17 in rbgcstart () at gc.c:1547
#13 0x00159a9d in callcfunc (func=0x179c00 <rbgcstart>, recv=3978176, len=0, argc=0, argv=0x0) at eval.c:5781
#14 0x00164a09 in rb
call0 (klass=, recv=, id=5313, oid=5313, argc=0, argv=0x0, body=0xb7fd88bc, flags=) at eval.c:5928
#15 0x00164baa in rbcall (klass=3086846160, recv=3086846180, mid=5313, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#16 0x00161e7b in rb
eval (self=, n=) at eval.c:3506
#17 0x00163404 in rbyield0 (val=, self=, klass=0, flags=, avalue=0) at eval.c:5095
#18 0x0016e657 in rbyield (val=3) at eval.c:5179
#19 0x0018e641 in int
dotimes (num=201) at numeric.c:2960
#20 0x00159a9d in callcfunc (func=0x18e5f0 <intdotimes>, recv=3978176, len=0, argc=0, argv=0x0) at eval.c:5781
#21 0x00164a09 in rbcall0 (klass=, recv=, id=5753, oid=5753, argc=0, argv=0x0, body=0xb7fe4c5c, flags=) at eval.c:5928
#22 0x00164baa in rb
call (klass=3086896500, recv=201, mid=5753, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#23 0x00161e7b in rbeval (self=, n=) at eval.c:3506
#24 0x0016292e in rb
eval (self=, n=) at eval.c:3236
#25 0x00163404 in rbyield0 (val=, self=, klass=0, flags=, avalue=2) at eval.c:5095
#26 0x0016374a in rbthreadyield (arg=3086829920, th=0x8087658) at eval.c:12553
#27 0x0016c2f9 in rbthreadstart0 (fn=, arg=, th=0x8087658) at eval.c:12471
#28 0x00159ade in call
cfunc (func=0x16c420 , recv=3978176, len=-2, argc=0, argv=0x0) at eval.c:5775
#29 0x00164a09 in rbcall0 (klass=, recv=, id=2961, oid=2961, argc=0, argv=0x0, body=0xb7fe59b8, flags=) at eval.c:5928
#30 0x00164baa in rb
call (klass=3086899740, recv=3086829940, mid=2961, argc=0, argv=0x0, scope=1, self=6) at eval.c:6176
#31 0x00165459 in rbfuncall2 (recv=3978176, mid=2961, argc=0, argv=0x0) at eval.c:6312
#32 0x001654f7 in rb
objcallinit (obj=3086829940, argc=0, argv=0x0) at eval.c:7825
#33 0x00165552 in rbthreadsnew (argc=0, argv=0x0, klass=3086899740) at eval.c:12584
#34 0x00159ab8 in call
cfunc (func=0x165510 , recv=3978176, len=-1, argc=0, argv=0x0) at eval.c:5778
#35 0x00164a09 in rbcall0 (klass=, recv=, id=3361, oid=3361, argc=0, argv=0x0, body=0xb7fe59e0, flags=) at eval.c:5928
#36 0x00164baa in rb
call (klass=3086899720, recv=3086899740, mid=3361, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#37 0x00161e7b in rbeval (self=, n=) at eval.c:3506
#38 0x0016292e in rb
eval (self=, n=) at eval.c:3236
#39 0x0015f659 in rbeval (self=, n=) at eval.c:3501
#40 0x00170d66 in ruby
execinternal () at eval.c:1654
#41 0x00170db2 in ruby
exec () at eval.c:1674
#42 0x00170df5 in ruby_run () at eval.c:1684
#43 0x0804869d in main (argc=2, argv=0xbfffed24, envp=0xbfffed30) at main.c:48

=end

#2 Updated by Nobuyoshi Nakada about 3 years ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to Shyouhei Urabe

=begin

=end

#3 Updated by Sean Bradly about 3 years ago

=begin
Fwiw, this resolves the issue for me. The general problem is that "struct loadarg arg" is on the stack AND wrapped by an RData. When GC starts, the RData refers to arg (a stack pointer) but the current (GC) thread's stack has clobbered this address. My fix simply switches arg to be dynamically allocated, and the same for markdump_arg. However, it appears this will leak a small amount of memory if Marshal throws an exception. This seems ok for my application, so I'm not going to look into it any more.
=end

#4 Updated by Nobuyoshi Nakada about 3 years ago

=begin
r25230 has fixed this bug in trunk and 1.8.
=end

#5 Updated by Shyouhei Urabe almost 3 years ago

  • Assignee changed from Shyouhei Urabe to Nobuyoshi Nakada

... and r25230 has a bug. cf: #2386

#6 Updated by Shyouhei Urabe almost 3 years ago

Also,

#7 Updated by Vit Ondruch over 2 years ago

Hi Shyouhei,

Could you please translate for me what is wrong with this patch? I'd like to apply this patch for Ruby in RHEL 6, since it fixes one bug we are facing when using mcollective.

#8 Updated by Shyouhei Urabe over 2 years ago

  • Assignee changed from Nobuyoshi Nakada to Shyouhei Urabe

It's a rather simple story. Instead of applying the proposed patch Nobu wrote something different (against trunk) for his own and told me to backport that. But I found Nobu's introduced a test failure. That's all.

So we had not take a close look at the patch attatched here (I read this one then, but didn't evaluated). Sorry. I tested it today and it seems OK. If you plan to fix this issue use the one attatched here instead of applying Nobu's. I'll also apply this in the next patchlevel.

#9 Updated by Nobuyoshi Nakada over 2 years ago

This patch seems to leak memory if an exception raises during dump/load.
You should use DataMakeStruct() with dfree -1, I guess.

#10 Updated by Motohiro KOSAKI over 2 years ago

  • Status changed from Assigned to Feedback

Vit, please send us an updated patch.

thank you.

#11 Updated by Vit Ondruch over 2 years ago

Hi,

Could you please review the attached patch? It is basically backported the upstream marshal.c, i.e. it contains slightly more stuff then just r25230. "make test-all" passes as well as the original reproducer. Please apply this patch to 1.8

#12 Updated by Vit Ondruch about 2 years ago

Bump.

Can somebody take look on the ruby-1.8.7-marshal.patch patch, please?

#13 Updated by Jim Meyering about 2 years ago

Hello,
I encountered this bug on RHEL-6's ruby-1.8.7.299 and with .352 and
reported it against RHEL-6 via http://bugzilla.redhat.com/781561
Applying the patch,
http://bugs.ruby-lang.org/attachments/2243/ruby-1.8.7-marshal.patch
solved our problem nicely. Please consider it for upstream ruby 1.8.7.

The reproducer, http://bugs.ruby-lang.org/attachments/1446/REPRO4.rb
also triggers a segmentation fault with Fedora 16's ruby:

$ ruby /t/REPRO4.rb
/t/REPRO4.rb:35: [BUG] Segmentation fault
ruby 1.8.7 (2011-12-28 patchlevel 357) [x86_64-linux]

zsh: abort (core dumped)  ruby /t/REPRO4.rb

#14 Updated by Nobuyoshi Nakada about 2 years ago

  • Tracker changed from Backport to Bug
  • Project changed from Backport87 to Ruby 1.8
  • Category changed from core to core

#15 Updated by Nobuyoshi Nakada about 2 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r34866.
Sean, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • marshal.c (markdumparg): mark destination string. patch by Vit Ondruch. [Bug #4339]
  • marshal.c (cleardumparg, clearloadarg): clean up also data tables as same as symbols tables.

#16 Updated by Nobuyoshi Nakada about 2 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby 1.8 to Backport87
  • Category changed from core to core

#17 Updated by Vit Ondruch about 2 years ago

Thank you for accepting the patch.

#18 Updated by Vit Ondruch about 2 years ago

BTW shouldn't it go into ruby-187 branch instead of ruby18?

#19 Updated by Marc-Andre Lafortune about 2 years ago

Hi,

Vit Ondruch wrote:

BTW shouldn't it go into ruby-187 branch instead of ruby18?

Nobu applied it also to -187 as r34867

Also available in: Atom PDF