Project

General

Profile

Bug #12699

Crash in the VM - maybe garbage collector bug

Added by nicolasnoble (Nicolas Noble) about 3 years ago. Updated about 3 years ago.

Status:
Third Party's Issue
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
[ruby-core:77034]

Description

Basically, we were investigating this: https://github.com/grpc/grpc/issues/7661

Our investigation led to realize that this assert in the protobuf code is being triggered, but only if the garbage collector has been exercised enough: https://github.com/google/protobuf/blob/master/ruby/ext/google/protobuf_c/map.c#L74

If the garbage collector is really under heavy stress, we can even produce a VM crash: http://pastebin.com/hzgHPJGq

I have included a zip file with our current reproduction case. Right now, this can crash any of the versions of Ruby I've been able to try this with. The reproduction steps are as follow:

$ bundle install
$ bundle exec gem repro.rb

The idea of the repro is to load a baked binary protobuf from the disk, and deserialize it enough times in memory to eventually cause a failure. The failure is evidently due to some corruption that happens in the Ruby VM. We have checked that the actual raw memory itself hasn't been altered - and even though it would've been, the internal assert being triggered shouldn't have happened in the first place.

When using a vanilla version of Ruby, the crash will not be deterministic. However, compiling a custom Ruby library with the timer_thread disabled causes the crash to become fully deterministic. Changing the value of the number of times we try to deserialize the object while the garbage collector is disabled will alter the behavior of the problem.

It would also be reasonable to suspect that the protobuf C extension is using the Ruby C API in a way that causes the VM's memory to eventually go corrupt, but we haven't found anything in the code that would be suspicious, and it is in fact a pretty standard key/value operation that's happening there. But I will be cross-filing a similar bug report on the google-protobuf project anyway.


Files

ruby-repro.zip (7.02 KB) ruby-repro.zip nicolasnoble (Nicolas Noble), 08/24/2016 06:00 AM
Dockerfile (739 Bytes) Dockerfile nicolasnoble (Nicolas Noble), 08/25/2016 12:48 AM

History

Updated by nicolasnoble (Nicolas Noble) about 3 years ago

Adding a Dockerfile that makes it easy to reproduce the problem, under Valgrind. The output of this Dockerfile gives the following report:

==1== Memcheck, a memory error detector
==1== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==1== Command: ruby repro.rb
==1==
==1== Use of uninitialised value of size 8
==1== at 0x226120: rb_id_table_lookup (id_table.c:1516)
==1== by 0x26F2FB: lookup_method_table (vm_method.c:190)
==1== by 0x26F2FB: search_method (vm_method.c:701)
==1== by 0x26F2FB: method_entry_get_without_cache (vm_method.c:726)
==1== by 0x272E98: method_entry_get (vm_method.c:786)
==1== by 0x272E98: rb_callable_method_entry (vm_method.c:831)
==1== by 0x280C89: rb_call0 (vm_eval.c:343)
==1== by 0x2813F3: rb_call (vm_eval.c:637)
==1== by 0x2813F3: rb_funcall (vm_eval.c:835)
==1== by 0x2098F7: rb_obj_as_string (string.c:1254)
==1== by 0x1F5C17: ruby__sfvextra (sprintf.c:1375)
==1== by 0x1F6BA9: BSD_vfprintf (vsnprintf.c:836)
==1== by 0x1FD333: rb_enc_vsprintf (sprintf.c:1421)
==1== by 0x1FD333: rb_vsprintf (sprintf.c:1445)
==1== by 0x2FB26D: rb_raise (error.c:2060)
==1== by 0x2FC47D: rb_check_type (error.c:599)
==1== by 0x7195074: Map_index_set (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1== Uninitialised value was created by a client request
==1== at 0x14B118: gc_page_sweep (gc.c:3384)
==1== by 0x14B118: gc_sweep_step (gc.c:3559)
==1== by 0x150EC2: gc_sweep_rest (gc.c:3608)
==1== by 0x150EC2: gc_rest.part.63 (gc.c:6267)
==1== by 0x151FAC: gc_rest (gc.c:6161)
==1== by 0x151FAC: garbage_collect (gc.c:6154)
==1== by 0x152187: garbage_collect_with_gvl (gc.c:6394)
==1== by 0x152203: objspace_malloc_increase.isra.67 (gc.c:7558)
==1== by 0x1534BD: objspace_xmalloc (gc.c:7650)
==1== by 0x7192326: Message_alloc (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1== by 0x195D7F: rb_obj_alloc (object.c:1823)
==1== by 0x196070: rb_class_new_instance (object.c:1855)
==1== by 0x719668C: submsg_handler (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1== by 0x71B0DFF: run_decoder_vm (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1== by 0x719891F: Message_decode (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1==
==1== Invalid read of size 4
==1== at 0x226120: rb_id_table_lookup (id_table.c:1516)
==1== by 0x26F2FB: lookup_method_table (vm_method.c:190)
==1== by 0x26F2FB: search_method (vm_method.c:701)
==1== by 0x26F2FB: method_entry_get_without_cache (vm_method.c:726)
==1== by 0x272E98: method_entry_get (vm_method.c:786)
==1== by 0x272E98: rb_callable_method_entry (vm_method.c:831)
==1== by 0x280C89: rb_call0 (vm_eval.c:343)
==1== by 0x2813F3: rb_call (vm_eval.c:637)
==1== by 0x2813F3: rb_funcall (vm_eval.c:835)
==1== by 0x2098F7: rb_obj_as_string (string.c:1254)
==1== by 0x1F5C17: ruby__sfvextra (sprintf.c:1375)
==1== by 0x1F6BA9: BSD_vfprintf (vsnprintf.c:836)
==1== by 0x1FD333: rb_enc_vsprintf (sprintf.c:1421)
==1== by 0x1FD333: rb_vsprintf (sprintf.c:1445)
==1== by 0x2FB26D: rb_raise (error.c:2060)
==1== by 0x2FC47D: rb_check_type (error.c:599)
==1== by 0x7195074: Map_index_set (in /usr/local/lib/ruby/gems/2.3.0/gems/google-protobuf-3.0.0.alpha.5.0.5.1-x86_64-linux/lib/google/2.3/protobuf_c.so)
==1== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1==

Updated by duerst (Martin Dürst) about 3 years ago

  • Assignee deleted (ruby-core)

Updated by wanabe (_ wanabe) about 3 years ago

  • Status changed from Open to Third Party's Issue

This issue have been cross-filed to google-protobuf by Nicolas Noble, original reporter.
https://github.com/google/protobuf/issues/2004

This is fixed by the pull request of google-protobuf.
Looks like GC mark missing, as I see.
https://github.com/google/protobuf/pull/2012

And "Updated packages are now available on RubyGems."
https://github.com/google/protobuf/issues/2004#issuecomment-247202147

Also available in: Atom PDF