Bug #11799
closedObject allocation during garbage collection phase terminates the Ruby process
Description
Resurrection of Bug #11549 that was apparently inadvertently deleted.
Also note the similarity to Bug #10868.
Summary: Multi-Threaded Ruby apps are often problematic; especially so when utilizing thread pools, and scheduling work to worker threads.
- While the problem isn't easily recreated, it can be reliably recreated given sufficient time and conditions (for garbage collection in conjunction with sibling thread behavior).
- Initially, three trace files have been attached. Each shows the problem in different conditions, and using different client infrastructure.
- If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.
RE: attached file ruby_2.2.3_obj_alloc_gc_bug.txt
Following is the section of sap_consumer_control.rb that is presented by the Ruby interpreter as being the current execution context when the problem occurs.
408: loop do
409: @worker_threads.schedule(@work_queue.pop, &@consumer)
410: @sap_packets_consumed += 1
411: endNotes:
- @work_queue is a Ruby Queue (allocated within the main thread) into which a producer thread places work requests.
- @worker_threads is a thread pool (allocated within the main thread).
- @worker_threads schedule method simply puts a work request into the thread pool's internal work queue. One of the worker threads within the thread pool will consume/effect the work request, by executing the specified consumer Proc.
- The main program thread simply loops forever scheduling work to thread pool threads.
- It appears that an object is being allocated by virtue of the @work_queue.pop
RE: attached file ruby_bug_redis_client.txt
- The observed behavior also occurs readily when employing the Redis client gem.
- This is the use case in which the bug is most easily reproduced.
RE: attached file ruby_bug-gdb.txt
- The observed behavior has also been observed when using stretcher, faraday, net-http-persistent gems.
- Connections via stretcher to net-http-persistent are on a per thread basis (maintained within thread local storage).
- In this particular case, the Ruby process was run under gdb, and back traces for each of the process threads are provided.
Files
Updated by charlez (Charles Leu) almost 10 years ago
- File deleted (
ruby_bug-gdb.txt)
Updated by charlez (Charles Leu) almost 10 years ago
- File ruby_bug-gdb.txt ruby_bug-gdb.txt added
Supplied correct edition of file ruby_bug-gdb.txt
Updated by normalperson (Eric Wong) almost 10 years ago
charlez.leu@gmail.com wrote:
- If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.
Fwiw, I took a quick look at the dmark/dfree callbacks registered with
*Data_(Wrap|Make)_Struct macros in the following non-standard extension
gems but did not notice anything amiss with the dmark/dfree callbacks
that could trigger allocations:
google_hash 0.8.9 (C++)
msgpack 0.7.0 and 0.7.1
xxhash 0.3.0
yajl-ruby 1.2.1
Perhaps I missed something in those or the bug is elsewhere.
Keep in mind I don't know C++, so I'm least confident about google_hash.
Updated by ko1 (Koichi Sasada) almost 10 years ago
Thank you for your log.
This is a reason.
(gdb) thread 12
[Switching to thread 12 (Thread 0x7fffe5fc4700 (LWP 27808))]#0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007ffff7b0375b in native_cond_wait (limits_us=<value optimized out>) at thread_pthread.c:338
#2 gvl_acquire_common (limits_us=<value optimized out>) at thread_pthread.c:87
#3 gvl_yield (limits_us=<value optimized out>) at thread_pthread.c:155
#4 rb_thread_schedule_limits (limits_us=<value optimized out>) at thread.c:1209
#5 0x00007ffff7b03b7c in rb_threadptr_execute_interrupts (th=<value optimized out>, blocking_timing=0) at thread.c:2019
#6 0x00007ffff7ae236d in vm_call0_body (th=0x1988bb0, ci=<value optimized out>, argv=0x7fffe5fc10b0) at vm_eval.c:252
#7 0x00007ffff7ae331e in vm_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:59
#8 rb_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:349
#9 rb_call (recv=18167000, mid=152, n=1) at vm_eval.c:616
#10 rb_funcall (recv=18167000, mid=152, n=1) at vm_eval.c:818
#11 0x00007fffef6659ed in eqrb::operator()(unsigned long, unsigned long) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#12 0x00007fffef6f6b6f in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::KeyInfo::equals(unsigned long const&, unsigned long const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#13 0x00007fffef6f5739 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::equals(unsigned long const&, unsigned long const&) const ()
---Type <return> to continue, or q <return> to quit---
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#14 0x00007fffef6f65ab in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted_key(unsigned long const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#15 0x00007fffef6f5440 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted(google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > > const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#16 0x00007fffef6f4a49 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::advance_past_deleted() ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#17 0x00007fffef6f45e4 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsi---Type <return> to continue, or q <return> to quit---
gned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::operator++() () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#18 0x00007fffef6f38e8 in mark_hash_map_values () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#19 0x00007ffff79c91cb in gc_mark_children (objspace=0x6029f0, obj=18167000) at gc.c:4190
#20 0x00007ffff79ca5f4 in rgengc_rememberset_mark (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>,
immediate_sweep=<value optimized out>, reason=<value optimized out>) at gc.c:5376
#21 gc_marks_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:4951
#22 gc_marks (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:5206
#23 gc_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:5957
#24 0x00007ffff79cc65c in heap_prepare (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1564
#25 heap_get_freeobj_from_next_freepage (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1576
#26 heap_get_freeobj (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1600
#27 newobj_of (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1645
#28 0x00007ffff796ae69 in ary_alloc (capa=3) at array.c:445
(1) Thread 12 runs GC
(2) mark function in /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so is called. #17 (how to read...?)
(3) #13 calls Ruby code from mark function. <- BUG
(4) Ruby code switches to Thread 6.
(5) Thread 6 runs own Ruby code and cause GC
(6) get [BUG] message
This is 3rd party issue.
Updated by ko1 (Koichi Sasada) almost 10 years ago
https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170
I'm not sure the implementation details, but
for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {
this iteration calls #== method.
Updated by nobu (Nobuyoshi Nakada) almost 10 years ago
- Status changed from Open to Third Party's Issue
Updated by charlez (Charles Leu) almost 10 years ago
Koichi Sasada wrote:
https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170
I'm not sure the implementation details, but
for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {
this iteration calls #== method.
Thanks for the prompt attention and feedback. It's greatly appreciated.
google_hash will be removed (unfortunately it was being relied upon heavily), and I'll re-test.
If other gems (e.g. yajl-ruby) are also problematic I'll remove their usage as well.