Backport #4339: Segmentation fault during Marshal.load - Backport187 - Ruby Issue Tracking System

Actions

Copy link

Backport #4339

closed

Segmentation fault during Marshal.load

Added by rhythmx (Sean Bradly) almost 14 years ago. Updated almost 13 years ago.

Status:

Closed

Assignee:

shyouhei (Shyouhei Urabe)

[ruby-core:34955]

Description

=begin
Conditions that seem to have to be satisfied:

A call to Marshal.load must be interrupted by a context switch.
Another thread calls GC.start manually.
The object being marshaled needs to have a lot of child objects to be reliable.

This appears to be a thread-safety issue affecting Ruby 1.8.x only. It triggers reliably for me on:

Ubuntu -> ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-linux]
Ubuntu-rvm -> ruby 1.8.7 (2010-12-23 patchlevel 330) [i686-linux]
MacRuby -> ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0]
Mac-rvm -> ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin10.4.0]
=end

Files

Download all files

REPRO4.rb (906 Bytes) REPRO4.rb	Reproduces the segfault	rhythmx (Sean Bradly), 01/29/2011 09:09 AM
ruby-1.8.7-trac22417.patch (3.49 KB) ruby-1.8.7-trac22417.patch	Patch file	rhythmx (Sean Bradly), 02/01/2011 08:56 AM
ruby-1.8.7-marshal.patch (6.09 KB) ruby-1.8.7-marshal.patch	backport of marshall patch from upstream	vo.x (Vit Ondruch), 11/15/2011 11:16 PM

Actions

Copy link

Updated by rhythmx (Sean Bradly) almost 14 years ago

=begin
Including a gdb trace. This issue happens in the mark_load_arg function when st_foreach encounters a corrupt bin ptr. I have seen similar failures occur in marshal_dump_arg. Also, very rarely the st_table_entry list of one of the bins will become circularly linked, causing an infinite loop.

Program received signal SIGSEGV, Segmentation fault.
0x001c86b6 in st_foreach (table=0x3cb3c0, func=0x178f40 <mark_entry>, arg=0) at st.c:486
486 for(ptr = table->bins[i]; ptr != 0;) {
(gdb) bt
#0 0x001c86b6 in st_foreach (table=0x3cb3c0, func=0x178f40 <mark_entry>, arg=0) at st.c:486
#1 0x001786c3 in mark_tbl (tbl=0x0) at gc.c:716
#2 rb_mark_tbl (tbl=0x0) at gc.c:723
#3 0x00189c04 in mark_load_arg (ptr=0xbfffca3c) at marshal.c:841
#4 0x00178d7e in gc_mark_children (ptr=3086829960, lev=1) at gc.c:1025
#5 0x00178a71 in mark_locations_array (x=, n=) at gc.c:684
#6 0x00157b95 in thread_mark (th=0x8087300) at eval.c:10466
#7 0x00178d7e in gc_mark_children (ptr=3086830120, lev=3) at gc.c:1025
#8 0x00178d06 in gc_mark_children (ptr=, lev=) at gc.c:1006
#9 0x00178e18 in gc_mark_children (ptr=, lev=) at gc.c:1057
#10 0x001791de in garbage_collect () at gc.c:1465
#11 0x00179be7 in rb_gc () at gc.c:1530
#12 0x00179c17 in rb_gc_start () at gc.c:1547
#13 0x00159a9d in call_cfunc (func=0x179c00 <rb_gc_start>, recv=3978176, len=0, argc=0, argv=0x0) at eval.c:5781
#14 0x00164a09 in rb_call0 (klass=, recv=, id=5313, oid=5313, argc=0, argv=0x0, body=0xb7fd88bc, flags=) at eval.c:5928
#15 0x00164baa in rb_call (klass=3086846160, recv=3086846180, mid=5313, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#16 0x00161e7b in rb_eval (self=, n=) at eval.c:3506
#17 0x00163404 in rb_yield_0 (val=, self=, klass=0, flags=, avalue=0) at eval.c:5095
#18 0x0016e657 in rb_yield (val=3) at eval.c:5179
#19 0x0018e641 in int_dotimes (num=201) at numeric.c:2960
#20 0x00159a9d in call_cfunc (func=0x18e5f0 <int_dotimes>, recv=3978176, len=0, argc=0, argv=0x0) at eval.c:5781
#21 0x00164a09 in rb_call0 (klass=, recv=, id=5753, oid=5753, argc=0, argv=0x0, body=0xb7fe4c5c, flags=) at eval.c:5928
#22 0x00164baa in rb_call (klass=3086896500, recv=201, mid=5753, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#23 0x00161e7b in rb_eval (self=, n=) at eval.c:3506
#24 0x0016292e in rb_eval (self=, n=) at eval.c:3236
#25 0x00163404 in rb_yield_0 (val=, self=, klass=0, flags=, avalue=2) at eval.c:5095
#26 0x0016374a in rb_thread_yield (arg=3086829920, th=0x8087658) at eval.c:12553
#27 0x0016c2f9 in rb_thread_start_0 (fn=, arg=, th=0x8087658) at eval.c:12471
#28 0x00159ade in call_cfunc (func=0x16c420 <rb_thread_initialize>, recv=3978176, len=-2, argc=0, argv=0x0) at eval.c:5775
#29 0x00164a09 in rb_call0 (klass=, recv=, id=2961, oid=2961, argc=0, argv=0x0, body=0xb7fe59b8, flags=) at eval.c:5928
#30 0x00164baa in rb_call (klass=3086899740, recv=3086829940, mid=2961, argc=0, argv=0x0, scope=1, self=6) at eval.c:6176
#31 0x00165459 in rb_funcall2 (recv=3978176, mid=2961, argc=0, argv=0x0) at eval.c:6312
#32 0x001654f7 in rb_obj_call_init (obj=3086829940, argc=0, argv=0x0) at eval.c:7825
#33 0x00165552 in rb_thread_s_new (argc=0, argv=0x0, klass=3086899740) at eval.c:12584
#34 0x00159ab8 in call_cfunc (func=0x165510 <rb_thread_s_new>, recv=3978176, len=-1, argc=0, argv=0x0) at eval.c:5778
#35 0x00164a09 in rb_call0 (klass=, recv=, id=3361, oid=3361, argc=0, argv=0x0, body=0xb7fe59e0, flags=) at eval.c:5928
#36 0x00164baa in rb_call (klass=3086899720, recv=3086899740, mid=3361, argc=0, argv=0x0, scope=0, self=3086911820) at eval.c:6176
#37 0x00161e7b in rb_eval (self=, n=) at eval.c:3506
#38 0x0016292e in rb_eval (self=, n=) at eval.c:3236
#39 0x0015f659 in rb_eval (self=, n=) at eval.c:3501
#40 0x00170d66 in ruby_exec_internal () at eval.c:1654
#41 0x00170db2 in ruby_exec () at eval.c:1674
#42 0x00170df5 in ruby_run () at eval.c:1684
#43 0x0804869d in main (argc=2, argv=0xbfffed24, envp=0xbfffed30) at main.c:48

=end

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) almost 14 years ago

Category set to core
Status changed from Open to Assigned
Assignee set to shyouhei (Shyouhei Urabe)

=begin

=end

Actions

Copy link

Updated by rhythmx (Sean Bradly) almost 14 years ago

File ruby-1.8.7-trac22417.patch ruby-1.8.7-trac22417.patch added

=begin
Fwiw, this resolves the issue for me. The general problem is that "struct load_arg arg" is on the stack AND wrapped by an RData. When GC starts, the RData refers to arg (a stack pointer) but the current (GC) thread's stack has clobbered this address. My fix simply switches arg to be dynamically allocated, and the same for mark_dump_arg. However, it appears this will leak a small amount of memory if Marshal throws an exception. This seems ok for my application, so I'm not going to look into it any more.
=end

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) almost 14 years ago

=begin
r25230 has fixed this bug in trunk and 1.8.
=end

Actions

Copy link

#5 [ruby-core:36356]

Updated by shyouhei (Shyouhei Urabe) over 13 years ago

Assignee changed from shyouhei (Shyouhei Urabe) to nobu (Nobuyoshi Nakada)

... and r25230 has a bug. cf: #2386

Actions

Copy link

#6 [ruby-core:36357]

Updated by shyouhei (Shyouhei Urabe) over 13 years ago

Also, [ruby-dev:39723]

Actions

Copy link

#7 [ruby-core:40876]

Updated by vo.x (Vit Ondruch) about 13 years ago

Hi Shyouhei,

Could you please translate for me what is wrong with this patch? I'd like to apply this patch for Ruby in RHEL 6, since it fixes one bug we are facing when using mcollective.

Actions

Copy link

#8 [ruby-core:40882]

Updated by shyouhei (Shyouhei Urabe) about 13 years ago

Assignee changed from nobu (Nobuyoshi Nakada) to shyouhei (Shyouhei Urabe)

It's a rather simple story. Instead of applying the proposed patch Nobu wrote something different (against trunk) for his own and told me to backport that. But I found Nobu's introduced a test failure. That's all.

So we had not take a close look at the patch attatched here (I read this one then, but didn't evaluated). Sorry. I tested it today and it seems OK. If you plan to fix this issue use the one attatched here instead of applying Nobu's. I'll also apply this in the next patchlevel.

Actions

Copy link

#9 [ruby-core:40883]

Updated by nobu (Nobuyoshi Nakada) about 13 years ago

This patch seems to leak memory if an exception raises during dump/load.
You should use Data_Make_Struct() with dfree -1, I guess.

Actions

Copy link

#10 [ruby-core:40884]

Updated by kosaki (Motohiro KOSAKI) about 13 years ago

Status changed from Assigned to Feedback

Vit, please send us an updated patch.

thank you.

Actions

Copy link

#11

Updated by vo.x (Vit Ondruch) about 13 years ago

File ruby-1.8.7-marshal.patch ruby-1.8.7-marshal.patch added

Hi,

Could you please review the attached patch? It is basically backported the upstream marshal.c, i.e. it contains slightly more stuff then just r25230. "make test-all" passes as well as the original reproducer. Please apply this patch to 1.8

Actions

Copy link

#12 [ruby-core:42193]

Updated by vo.x (Vit Ondruch) almost 13 years ago

Bump.

Can somebody take look on the ruby-1.8.7-marshal.patch patch, please?

Actions

Copy link

#13 [ruby-core:42305]

Updated by meyering (Jim Meyering) almost 13 years ago

Hello,
I encountered this bug on RHEL-6's ruby-1.8.7.299 and with .352 and
reported it against RHEL-6 via http://bugzilla.redhat.com/781561
Applying the patch,
http://bugs.ruby-lang.org/attachments/2243/ruby-1.8.7-marshal.patch
solved our problem nicely. Please consider it for upstream ruby 1.8.7.

The reproducer, http://bugs.ruby-lang.org/attachments/1446/REPRO4.rb
also triggers a segmentation fault with Fedora 16's ruby:

$ ruby /t/REPRO4.rb
/t/REPRO4.rb:35: [BUG] Segmentation fault
ruby 1.8.7 (2011-12-28 patchlevel 357) [x86_64-linux]

zsh: abort (core dumped)  ruby /t/REPRO4.rb

Actions

Copy link

#14

Updated by nobu (Nobuyoshi Nakada) almost 13 years ago

Tracker changed from Backport to Bug
Project changed from Backport187 to Ruby 1.8
Category changed from core to core

Actions

Copy link

#15

Updated by nobu (Nobuyoshi Nakada) almost 13 years ago

Status changed from Feedback to Closed
% Done changed from 0 to 100

This issue was solved with changeset r34866.
Sean, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

marshal.c (mark_dump_arg): mark destination string. patch by
Vit Ondruch. [Bug #4339]
marshal.c (clear_dump_arg, clear_load_arg): clean up also data
tables as same as symbols tables.

Actions

Copy link

#16