Project

General

Profile

Bug #13772

Memory leak recycling stacks for threads in 2.4.1

Added by sam.saffron (Sam Saffron) over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:82183]

Description

Per:

https://github.com/rest-client/rest-client/issues/611

gem install rest-client

100000.times.each_slice(32) do |slice|
    slice.map { Thread.new { RestClient.get('echo.jsontest.com/key/value/one/two') } }.each(&:join)
    GC.start(full_mark: true, immediate_sweep: true)
    slots = GC.stat[:heap_live_slots]
    puts "slots #{slots} #{i+=1} threads #{Thread.list.count}"

end

Causes unbound memory growth on 2.4.1, this was not the case in earlier versions of Ruby or Ruby master.

When running through heaptrack and massif-visualizer it looks like all unreclaimed allocations come from thread_recycle_stack

Ruby heaps do not grow during tests, all the growth is around unmanaged memory.

I feel getting this sorted is urgent, we have seen similar leaks at Discourse.


Related issues

Related to Ruby trunk - Feature #12628: change block/env structsClosed
Related to Ruby trunk - Bug #13775: Ruby hangs when calling scope and belongs_to many times (with mongomapper)Closed

Associated revisions

Revision 92f7813a
Added by ko1 (Koichi Sasada) over 1 year ago

release VM stack properly.

  • cont.c: r55766 change the handling method of Fiber's VM stack. Resumed Fiber points NULL as VM stack and running Thread has responsibility to manage it (marking and releasing).

However, thread_start_func_2()@thread.c and thread_free()@vm.c
doesn't free the VM stack if corresponding root Fiber is exist.
This causes memory leak. [Bug #13772]

  • cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
    because running thread has responsibility to manage this stack.

  • vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
    (callers should care it).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59462 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 59462
Added by ko1 (Koichi Sasada) over 1 year ago

release VM stack properly.

  • cont.c: r55766 change the handling method of Fiber's VM stack. Resumed Fiber points NULL as VM stack and running Thread has responsibility to manage it (marking and releasing).

However, thread_start_func_2()@thread.c and thread_free()@vm.c
doesn't free the VM stack if corresponding root Fiber is exist.
This causes memory leak. [Bug #13772]

  • cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
    because running thread has responsibility to manage this stack.

  • vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
    (callers should care it).

Revision 59462
Added by ko1 (Koichi Sasada) over 1 year ago

release VM stack properly.

  • cont.c: r55766 change the handling method of Fiber's VM stack. Resumed Fiber points NULL as VM stack and running Thread has responsibility to manage it (marking and releasing).

However, thread_start_func_2()@thread.c and thread_free()@vm.c
doesn't free the VM stack if corresponding root Fiber is exist.
This causes memory leak. [Bug #13772]

  • cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
    because running thread has responsibility to manage this stack.

  • vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
    (callers should care it).

Revision 59474
Added by ko1 (Koichi Sasada) over 1 year ago

fix stack storing for root fibers.

  • cont.c (root_fiber_alloc): this function is called by fiber_current()
    and fiber_store(). fiber_current() should clear VM stack information
    in a fiber data because runnning thread knows stack information and has
    responsibility to manage it. However fiber_store() requires to remain
    VM stack information in a fiber data because the responsibility to manage
    VM stack is moved to the Fiber from the Thread (and switch to another
    fiber).

  • cont.c (root_fiber_alloc): save thread's fiber and root_fiber information.

Revision 01cfae3b
Added by nagachika (Tomoyuki Chikanaga) over 1 year ago

merge revision(s) 59462,59474: [Backport #13772]

release VM stack properly.

* cont.c: r55766 change the handling method of Fiber's VM stack.
  Resumed Fiber points NULL as VM stack and running Thread has
  responsibility to manage it (marking and releasing).

  However, thread_start_func_2()@thread.c and thread_free()@vm.c
  doesn't free the VM stack if corresponding root Fiber is exist.
  This causes memory leak. [Bug #13772]

* cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
  because running thread has responsibility to manage this stack.

* vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
  (callers should care it).

fix stack storing for root fibers.

* cont.c (root_fiber_alloc): this function is called by fiber_current()
  and fiber_store(). fiber_current() should clear VM stack information
  in a fiber data because runnning thread knows stack information and has
  responsibility to manage it. However fiber_store() requires to remain
  VM stack information in a fiber data because the responsibility to manage
  VM stack is moved to the Fiber from the Thread (and switch to another
  fiber).

* cont.c (root_fiber_alloc): save thread's fiber and root_fiber information.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_4@59516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 59516
Added by nagachika (Tomoyuki Chikanaga) over 1 year ago

merge revision(s) 59462,59474: [Backport #13772]

release VM stack properly.

* cont.c: r55766 change the handling method of Fiber's VM stack.
  Resumed Fiber points NULL as VM stack and running Thread has
  responsibility to manage it (marking and releasing).

  However, thread_start_func_2()@thread.c and thread_free()@vm.c
  doesn't free the VM stack if corresponding root Fiber is exist.
  This causes memory leak. [Bug #13772]

* cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
  because running thread has responsibility to manage this stack.

* vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
  (callers should care it).

fix stack storing for root fibers.

* cont.c (root_fiber_alloc): this function is called by fiber_current()
  and fiber_store(). fiber_current() should clear VM stack information
  in a fiber data because runnning thread knows stack information and has
  responsibility to manage it. However fiber_store() requires to remain
  VM stack information in a fiber data because the responsibility to manage
  VM stack is moved to the Fiber from the Thread (and switch to another
  fiber).

* cont.c (root_fiber_alloc): save thread's fiber and root_fiber information.

History

#1 [ruby-core:82199] Updated by sam.saffron (Sam Saffron) over 1 year ago

OK I have a standalone repro here: https://gist.github.com/SamSaffron/f43996d5989cefbfbf69e9557ef13b23

Also impacts Ruby head

i = 0

def parse_param(s)
  unless block_given?
    return enum_for(__method__, s)
  end
  yield "a"
  nil
end

def parse_line(line)
  parts = parse_param(';' + line)
  parts.next
end


while i < 10_000
  Thread.new { parse_line("Content-Type:application/json; charset=ISO-8859-1") }.join
  slots = GC.stat[:heap_live_slots]
  _,rss = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
  puts "slots #{slots} #{i+=1} threads #{Thread.list.count} rss #{rss}"
end

puts "done #{RUBY_VERSION}"
gets


##
#  Ruby 2.3.3
#  slots 19183, rss 36980

#  Ruby 2.4.1
#  slots 14448, rss 132572
#
#  Ruby latest
#  slots 13783 rss 257108

#3 [ruby-core:82218] Updated by sam.saffron (Sam Saffron) over 1 year ago

On Friday I downgraded one of our clusters, this is how it looks.

#4 [ruby-core:82219] Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

#5 Updated by nobu (Nobuyoshi Nakada) over 1 year ago

#6 Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Related to Bug #13775: Ruby hangs when calling scope and belongs_to many times (with mongomapper) added

#7 [ruby-core:82221] Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: REQUIRED

I'm going to set Backport field based on the report at https://bugs.ruby-lang.org/issues/13772#note-1

#8 Updated by ko1 (Koichi Sasada) over 1 year ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r59462.


release VM stack properly.

  • cont.c: r55766 change the handling method of Fiber's VM stack. Resumed Fiber points NULL as VM stack and running Thread has responsibility to manage it (marking and releasing).

However, thread_start_func_2()@thread.c and thread_free()@vm.c
doesn't free the VM stack if corresponding root Fiber is exist.
This causes memory leak. [Bug #13772]

  • cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
    because running thread has responsibility to manage this stack.

  • vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
    (callers should care it).

#9 [ruby-core:82255] Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.2: DONTNEED, 2.3: DONTNEED, 2.4: REQUIRED to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: DONE

ruby_2_4 r59516 merged revision(s) 59462,59474.

#10 [ruby-core:82744] Updated by perlun (Per Lundberg) about 1 year ago

This bug unfortunately prevents my organization from using 2.4.1 at the moment, we will have to downgrade to 2.3.4. Is there any chance we could get a bug fix release (2.4.2) out including this fix, in the near future? We could always compile from source but it would be great if there would be an "official" fix release including this.

Thanks in advance.

#11 [ruby-core:82795] Updated by jrafanie (Joe Rafaniello) about 1 year ago

perlun (Per Lundberg) wrote:

This bug unfortunately prevents my organization from using 2.4.1 at the moment, we will have to downgrade to 2.3.4. Is there any chance we could get a bug fix release (2.4.2) out including this fix, in the near future? We could always compile from source but it would be great if there would be an "official" fix release including this.

Thanks in advance.

The fix for this issue was backported [1] and released in 2.4.2 [2]

[1] https://github.com/ruby/ruby/commit/01cfae3beb0b649ca2ae879825349cf0330a3549
[2] https://github.com/ruby/ruby/compare/v2_4_1...v2_4_2

#12 [ruby-core:82810] Updated by perlun (Per Lundberg) about 1 year ago

Thanks a lot, great news!

Also available in: Atom PDF