https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-10-06T19:32:15ZRuby Issue Tracking SystemRuby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995032022-10-06T19:32:15Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>I don't think this is a bug per say. The Ruby GC is conservative. That means it goes over the whole stack in search for potential references to objects, and mark them.</p>
<p>As a result, it can happen that an object ref stays in an unused saved register and prevent an object from being merged.</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995362022-10-10T16:08:34Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995632022-10-13T15:17:21Zparker (Parker Finch)
<ul></ul><p>Thanks <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a>! I think this could be considered a bug in the documentation, since the <a href="https://ruby-doc.org/stdlib-3.1.2/libdoc/weakref/rdoc/WeakRef.html" class="external">docs for WeakRef</a> imply that a <code>WeakRef</code> should be collected after a garbage collection. Perhaps we could call this corner-case out?</p>
<p>I'm also curious to learn more about this case. (I'm unfamiliar with Ruby's use of registers and how that interacts with live objects and garbage collection.) It seems like calling the <code>weakref_alive?</code> method is continually forcing the object ref into a register, and sleeping after calling that method gives time for the register to clear. Is that understanding correct? (I'm surprised that calling a method on the <code>WeakRef</code> object prevents the underlying object from being collected, since shouldn't that underlying one be collected even though the <code>WeakRef</code> itself still has a reference? Does the method call put the underlying object ref in a register?)</p>
<p>Is there a more reliable/direct way to get rid of the reference than sleeping?</p>
<p>One aspect of this where I'm still confused is why the loop given to reproduce this issue completes an iteration before hanging. What is different on the first iteration that allows this to succeed?</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995642022-10-13T15:23:47Zchrisseaton (Chris Seaton)chris@chrisseaton.com
<ul></ul><p>The documentation could be more clear, but also note that this isn't in any way specific to Ruby - I would say that this is expected behaviour for a managed language. A weak-ref <em>may</em> be cleared if no other references exist. That's should be the extent of the guarantee offered.</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995662022-10-13T16:38:56Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>parker (Parker Finch) wrote in <a href="#note-3">#note-3</a>:</p>
<blockquote>
<p>Thanks <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a>! I think this could be considered a bug in the documentation, since the <a href="https://ruby-doc.org/stdlib-3.1.2/libdoc/weakref/rdoc/WeakRef.html" class="external">docs for WeakRef</a> imply that a <code>WeakRef</code> should be collected after a garbage collection. Perhaps we could call this corner-case out?</p>
<p>I'm also curious to learn more about this case. (I'm unfamiliar with Ruby's use of registers and how that interacts with live objects and garbage collection.</p>
</blockquote>
<p>Ruby's garbage collector is conservative. Ruby objects that are allocated inside of C code must be kept alive. Lets look at a simple example:</p>
<pre><code class="c syntaxhl" data-language="c"><span class="kt">void</span> <span class="nf">neat_function</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">VALUE</span> <span class="n">list</span> <span class="o">=</span> <span class="n">rb_ary_new</span><span class="p">();</span>
<span class="n">rb_gc_start</span><span class="p">();</span>
<span class="n">rb_ary_push</span><span class="p">(</span><span class="n">list</span><span class="p">,</span> <span class="n">Qnil</span><span class="p">);</span>
<span class="p">}</span>
</code></pre>
<p>The above C code is compiled in to machine code, but the array's life span is managed by the garbage collector. How can the garbage collector ensure that the array stays alive even after the call to <code>rb_gc_start()</code>? We humans can clearly see that the array is used in the C code, but the GC cannot read the C code. In fact there is no C code for the GC because it's all machine code now! So how can the GC keep the reference alive? It will scan the <em>machine registers</em> as well as the <em>stack memory</em> looking for addresses that <em>might</em> be Ruby objects. The C compiler will probably have generated machine code that puts a reference to the local variable <code>list</code> in either a register or stack memory (there are cases where this doesn't happen, and we have to deal with it manually. See <code>RB_GC_GUARD</code>).</p>
<p>The GC will look at the values stored in the machine registers, as well as any values in stack memory, then check if those values are within the bounds of Ruby's GC heap memory. If the address is inside the bounds, then the GC will consider the object to be alive. The GC cannot know if a pointer stored in a machine register will ever be used again, so it takes a <em>conservative</em> approach and keeps the reference alive.</p>
<p>This conservative approach can lead to the behavior that you are seeing with the weak reference: a value that nobody is actually using or referencing is kept alive because the GC can't know that fact for sure. The reference may or may not stay alive, but it depends on what machine code has executed, if the value is in the stack, if any registers have been overwritten, etc.</p>
<p>I hope this helps.</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=995772022-10-14T20:04:21Zparker (Parker Finch)
<ul></ul><p>Thanks for that explanation <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/73">@tenderlovemaking (Aaron Patterson)</a>, it helps and I truly appreciate it!</p>
<p>One misunderstanding I had was that I was thinking about this in terms of the Ruby VM. But it seems like garbage collection actually occurs down at the machine level (which makes much more sense now that I think about it) and that's why we're dealing with registers. (And the stack we're talking about is the C stack and not the Ruby VM stack.)</p>
<p>The recommendation to take a look at <a href="https://github.com/ruby/ruby/blob/cbd3d655745564e3c33a29a5625ac30b4d69fb29/include/ruby/internal/memory.h#L110-L172" class="external">RB_GC_GUARD</a> was helpful as well, that's a great comment there.</p>
<p>I'm still curious <em>why</em> calling <code>#weakref_alive?</code> on the <code>WeakRef</code> seems to put the underlying <code>Object</code> (that the <code>WeakRef</code> delegates to) in a register or on the stack. But the fact that this is happening so close to the actual machine makes it seem like it would be tricky to figure out.</p>
<p>Anyway, I'll keep learning more about how memory management works, thank you for the info here! I think the docs are fine as-is, so it makes sense to me to close this one.</p>
<p>Thank you all for your time and explanations!</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=996632022-10-17T17:31:53Ztenderlovemaking (Aaron Patterson)tenderlove@ruby-lang.org
<ul></ul><p>parker (Parker Finch) wrote in <a href="#note-6">#note-6</a>:</p>
<blockquote>
<p>I'm still curious <em>why</em> calling <code>#weakref_alive?</code> on the <code>WeakRef</code> seems to put the underlying <code>Object</code> (that the <code>WeakRef</code> delegates to) in a register or on the stack. But the fact that this is happening so close to the actual machine makes it seem like it would be tricky to figure out.</p>
</blockquote>
<p>That method may not be putting the object in a register. Something else may have put it in a register or in the stack, and it just happens that no other machine code has overwritten the register or stack memory. If you dump the heap (<code>ObjectSpace.dump_all</code>), you'll probably see one of the roots (probably VM?) pointing at the object. Unfortunately the heap dump won't tell you <em>how</em> it found the reference, just that the reference exists. You could find whether it's a register or stack memory by adding some debugging code to the GC or by tracing the machine code via lldb.</p>
<p>It might be nice if <code>ObjectSpace.dump_all</code> could indicate whether the reference came from the stack or machine registers as I've also tried to figure that out. But it is work. 😅</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=997222022-10-18T21:33:21Zparker (Parker Finch)
<ul></ul><p>tenderlovemaking (Aaron Patterson) wrote in <a href="#note-7">#note-7</a>:</p>
<blockquote>
<p>That method may not be putting the object in a register. Something else may have put it in a register or in the stack, and it just happens that no other machine code has overwritten the register or stack memory.</p>
</blockquote>
<p>There's some evidence that the <code>weakref_alive?</code> method is putting it in a register or the stack. Running garbage collection <em>immediately</em> after calling <code>weakref_alive?</code> will fail to collect the underlying object. But if there's a <code>sleep</code> between the <code>weakref_alive?</code> and running garbage collection then the garbage collection will succeed in collecting the underlying object.</p>
<p>To test if it was the <code>weakref_alive?</code> call itself that was causing the issue I ran a few different scenarios:</p>
<pre><code># This version does not manifest the issue. (It makes it through two iterations
# and terminates.)
require "weakref"
iterations = 0
while iterations < 2
print "\r#{iterations}"
obj = WeakRef.new(Object.new)
while obj.weakref_alive?
# Sleep to give registers a chance to clear.
sleep(0.5)
GC.start
end
iterations += 1
end
</code></pre>
<pre><code># This version does manifest the issue. (It gets stuck in the inner loop and
# never terminates.)
require "weakref"
iterations = 0
while iterations < 2
print "\r#{iterations}"
obj = WeakRef.new(Object.new)
while obj.weakref_alive?
# Sleep to give registers a chance to clear.
sleep(0.5)
# Call the `WeakRef#weakref_alive?` method to see if that causes the issue
# to manifest. (It does, GC does _not_ clear out the underlying Object after
# this.)
obj.weakref_alive?
GC.start
end
iterations += 1
end
</code></pre>
<pre><code># This version does not manifest the issue. (It makes it through two iterations
# and terminates.)
require "weakref"
iterations = 0
while iterations < 2
print "\r#{iterations}"
obj = WeakRef.new(Object.new)
while obj.weakref_alive?
# Sleep to give registers a chance to clear.
sleep(0.5)
# Reference the WeakRef object to see if that causes the issue to
# manifest. (It does not, GC still clears out the underlying Object here.)
obj
GC.start
end
iterations += 1
end
</code></pre>
<pre><code># This version does not manifest the issue. (It makes it through two iterations
# and terminates.)
require "weakref"
iterations = 0
while iterations < 2
print "\r#{iterations}"
obj = WeakRef.new(Object.new)
while obj.weakref_alive?
# Sleep to give registers a chance to clear.
sleep(0.5)
# Call another method on the WeakRef object to see if that causes the issue
# to manifest. (It does not, GC still clears out the underlying Object
# here.)
obj.object_id
GC.start
end
iterations += 1
end
</code></pre>
<p>Sorry for the wall of code there — the summary is that the issue only seems to manifest when the <code>weakref_alive?</code> method is called immediately before garbage collecting.</p>
<p>The fact that the behavior is predictable in those different scenarios makes me think that the <code>weakref_alive?</code> method is doing something that adds a reference to the underlying <code>Object</code> to a register or the stack. Is there another explanation for the behavior there that I'm missing?</p>
<hr>
<blockquote>
<p>If you dump the heap (<code>ObjectSpace.dump_all</code>), you'll probably see one of the roots (probably VM?) pointing at the object. Unfortunately the heap dump won't tell you <em>how</em> it found the reference, just that the reference exists. You could find whether it's a register or stack memory by adding some debugging code to the GC or by tracing the machine code via lldb.</p>
</blockquote>
<p>Thanks <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/73">@tenderlovemaking (Aaron Patterson)</a>! I didn't know about <code>ObjectSpace.dump_all</code>. I'll try exploring those options to see if I can pin down how it's finding the reference to the Object. Heads up that it will likely take me a while since I'm not yet familiar with C and lldb.</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=1019192023-02-17T14:42:18Zparker (Parker Finch)
<ul><li><strong>File</strong> <a href="/attachments/9401">manifest_weakref_issue.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/9401/manifest_weakref_issue.rb">manifest_weakref_issue.rb</a> added</li></ul><p>Hi <a class="user active user-mention" href="https://bugs.ruby-lang.org/users/73">@tenderlovemaking (Aaron Patterson)</a>! I'm having difficulty interpreting the results of the <code>ObjectSpace</code> dump and I'm hoping you can help.</p>
<p>I've adjusted the script to print out the address of the underlying object, and then (when the issue manifests) print all lines from <code>ObjectSpace.dump_all</code> that match that address. The code is attached, here's some example output:</p>
<pre><code>Ruby version: 3.3.0
Iteration: 0
Object address: 0x1051cd788
Inner iterations: 1
Iteration: 1
Object address: 0x105205ae8
Inner iterations: 1
Inner iterations: 2
Inner iterations: 3
{"address":"0x105205ae8", "type":"OBJECT", "shape_id":5, "slot_size":40, "class":"0x1029bfe80", "embedded":true, "ivars":0, "memsize":40, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
{"address":"0x10520da90", "type":"STRING", "shape_id":0, "slot_size":40, "class":"0x1029beda0", "embedded":true, "bytesize":11, "value":"0x105205ae8", "encoding":"UTF-8", "coderange":"7bit", "memsize":40, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
</code></pre>
<p>In that example, the underlying object was at <code>0x105205ae8</code>. But as far as I can tell, there's nothing else that points at it. (The other object there is the String used to hold that address.) I would have expected that, if nothing was referencing it, it would be collected by GC.</p>
<p>One interesting tidbit is that just calling <code>ObjectSpace.dump_all</code> prevents the issue from manifesting. Is it possible that something <em>was</em> referencing the object address, then running <code>dump_all</code> caused that reference to be removed?</p> Ruby master - Bug #19041: Weakref is still alive after major garbage collectionhttps://bugs.ruby-lang.org/issues/19041?journal_id=1020042023-02-23T16:11:30Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/19460">Bug #19460</a>: Class not able to be garbage collected</i> added</li></ul>