Project

General

Profile

Bug #13772

Memory leak recycling stacks for threads in 2.4.1

Added by sam.saffron (Sam Saffron) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:82183]

Description

Per:

https://github.com/rest-client/rest-client/issues/611

gem install rest-client

100000.times.each_slice(32) do |slice|
    slice.map { Thread.new { RestClient.get('echo.jsontest.com/key/value/one/two') } }.each(&:join)
    GC.start(full_mark: true, immediate_sweep: true)
    slots = GC.stat[:heap_live_slots]
    puts "slots #{slots} #{i+=1} threads #{Thread.list.count}"

end

Causes unbound memory growth on 2.4.1, this was not the case in earlier versions of Ruby or Ruby master.

When running through heaptrack and massif-visualizer it looks like all unreclaimed allocations come from thread_recycle_stack

Ruby heaps do not grow during tests, all the growth is around unmanaged memory.

I feel getting this sorted is urgent, we have seen similar leaks at Discourse.


Related issues

Related to Ruby master - Feature #12628: change block/env structsClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Bug #13775: Ruby hangs when calling scope and belongs_to many times (with mongomapper)ClosedActions

Updated by sam.saffron (Sam Saffron) over 2 years ago

OK I have a standalone repro here: https://gist.github.com/SamSaffron/f43996d5989cefbfbf69e9557ef13b23

Also impacts Ruby head

i = 0

def parse_param(s)
  unless block_given?
    return enum_for(__method__, s)
  end
  yield "a"
  nil
end

def parse_line(line)
  parts = parse_param(';' + line)
  parts.next
end


while i < 10_000
  Thread.new { parse_line("Content-Type:application/json; charset=ISO-8859-1") }.join
  slots = GC.stat[:heap_live_slots]
  _,rss = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
  puts "slots #{slots} #{i+=1} threads #{Thread.list.count} rss #{rss}"
end

puts "done #{RUBY_VERSION}"
gets


##
#  Ruby 2.3.3
#  slots 19183, rss 36980

#  Ruby 2.4.1
#  slots 14448, rss 132572
#
#  Ruby latest
#  slots 13783 rss 257108

Updated by sam.saffron (Sam Saffron) over 2 years ago

On Friday I downgraded one of our clusters, this is how it looks.

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned
#5

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

#6

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Related to Bug #13775: Ruby hangs when calling scope and belongs_to many times (with mongomapper) added

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: REQUIRED

I'm going to set Backport field based on the report at https://bugs.ruby-lang.org/issues/13772#note-1

#8

Updated by ko1 (Koichi Sasada) over 2 years ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r59462.


release VM stack properly.

  • cont.c: r55766 change the handling method of Fiber's VM stack. Resumed Fiber points NULL as VM stack and running Thread has responsibility to manage it (marking and releasing).

However, thread_start_func_2()@thread.c and thread_free()@vm.c
doesn't free the VM stack if corresponding root Fiber is exist.
This causes memory leak. [Bug #13772]

  • cont.c (root_fiber_alloc): fib->cont.saved_thread.ec.stack should be NULL
    because running thread has responsibility to manage this stack.

  • vm.c (rb_thread_recycle_stack_release): assert given stack is not NULL
    (callers should care it).

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: DONTNEED, 2.4: REQUIRED to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: DONE

ruby_2_4 r59516 merged revision(s) 59462,59474.

Updated by perlun (Per Lundberg) over 2 years ago

This bug unfortunately prevents my organization from using 2.4.1 at the moment, we will have to downgrade to 2.3.4. Is there any chance we could get a bug fix release (2.4.2) out including this fix, in the near future? We could always compile from source but it would be great if there would be an "official" fix release including this.

Thanks in advance.

Updated by jrafanie (Joe Rafaniello) over 2 years ago

perlun (Per Lundberg) wrote:

This bug unfortunately prevents my organization from using 2.4.1 at the moment, we will have to downgrade to 2.3.4. Is there any chance we could get a bug fix release (2.4.2) out including this fix, in the near future? We could always compile from source but it would be great if there would be an "official" fix release including this.

Thanks in advance.

The fix for this issue was backported [1] and released in 2.4.2 [2]

[1] https://github.com/ruby/ruby/commit/01cfae3beb0b649ca2ae879825349cf0330a3549
[2] https://github.com/ruby/ruby/compare/v2_4_1...v2_4_2

Updated by perlun (Per Lundberg) over 2 years ago

Thanks a lot, great news!

Also available in: Atom PDF