Bug #20608
closedHash#find always allocates each iterated pair
Description
Hey there!
Recently I ran into this sharp edge in Hash#find
:
puts RUBY_DESCRIPTION
def allocated_now = GC.stat(:total_allocated_objects)
dummy_hash = 10_000.times.each_with_index.to_h
before_allocs = allocated_now
dummy_hash.any? { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"
before_allocs = allocated_now
dummy_hash.each { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"
before_allocs = allocated_now
dummy_hash.find { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"
before_allocs = allocated_now
dummy_hash.any? { |k| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"
Result:
ruby 3.4.0preview1 (2024-05-16 master 9d69619623) [x86_64-linux]
Allocated 0 for dummy_hash.any? { |k, v| }
Allocated 0 for dummy_hash.each { |k, v| }
Allocated 10002 for dummy_hash.find { |k, v| }
Allocated 10000 for dummy_hash.any? { |k| }
That is, while Hash#any?
, Hash#each
, etc avoid doing any allocations during iteration, Hash#find
does not hit the rb_block_pair_yield_optimizable
=> each_pair_i_fast
fast path, and so is massively costly compared to the others.
This is very surprising, as I'd expect a find
to have a comparable cost to any?
(and I ended up redoing some code to avoid find).
Also while experimenting a bit, it was surprising to me that the allocation optimization only kicks in when |k, v|
are declared, and thus .any? { |k| }
is also more expensive.
Updated by mame (Yusuke Endoh) 5 months ago
As you've probably noticed from reading the source, there is no Hash#find
method. Enumerable#find
is. It cannot know that #each
yields a two-element array, so it is not easy to optimize it.
We could define Hash#find method as a faster special version of Enumerable#find. I personally don't like the idea of manually copying many Enumerable methods into Hash...
I was rather surprised that there is a Hash#any?
. Even though there is no Hash#all?
. If the consistency is an issue, a simple fix to this "bug" would be to remove Hash#any?
.
Updated by ivoanjo (Ivo Anjo) 5 months ago
I personally don't like the idea of manually copying many Enumerable methods into Hash...
It is indeed annoying. On the good side, they could be implemented with Ruby code -- e.g. implementing find
using any?
. This would even open up the way YJIT to even optimize some of these methods (?).
If the consistency is an issue, a simple fix to this "bug" would be to remove Hash#any?.
Please no 😅! Unless I'm mistaken, Hash#any?
is the only method in Hash
that allows the iteration to be stopped mid-way (without having to raise or throw). If that was removed, then it would be even harder to iterate Hash
to find something + stop when it's found without allocating extra memory.
Updated by mame (Yusuke Endoh) 5 months ago
- Assignee set to ko1 (Koichi Sasada)
- Status changed from Open to Assigned
I have prototyped a patch that delays the array allocation of multiple arguments for Enumerable#find
, #any?
etc.
https://github.com/ruby/ruby/pull/11110
$ ./miniruby bench.rb
ruby 3.4.0dev (2024-07-05T17:35:30Z lazy-alloc-of-bloc.. 264a1d8810) [x86_64-linux]
Allocated 1 for dummy_hash.any? { |k, v| }
Allocated 1 for dummy_hash.each { |k, v| }
Allocated 5 for dummy_hash.find { |k, v| }
Allocated 10000 for dummy_hash.any? { |k| }
Enumerable#find
, #all?
, etc. no longer allocate an Array as long as the predicate block returns falsy. However, there is a cost of calling #each
method, so the performance would be slightly less than direct implementation like Hash#any?.
This patch introduces a flag for ifunc. I'm not sure if this issue is worth the complexity. @ko1 (Koichi Sasada) What do you think?
Updated by mame (Yusuke Endoh) 5 months ago
- Status changed from Assigned to Closed
I merged https://github.com/ruby/ruby/pull/11110.
Updated by ivoanjo (Ivo Anjo) 5 months ago
Amazing, thank you!