Feature #20182
closedRewrite Array#each in Ruby
Description
Proposal¶
Rewrite Array#each in Ruby https://github.com/ruby/ruby/pull/6687.
class Array
def each
unless block_given?
return to_enum(:each) { self.length }
end
i = 0
while i < self.length
yield self[i]
i = i.succ
end
self
end
end
Purpose¶
Make it possible for YJIT to optimize ISEQs across Array#each
.
Background¶
Whether JIT-compiled or not, calling Ruby from C is more expensive than calling Ruby from Ruby. It also prevents YJIT from making cross-ISEQ optimizations.
This is problematic especially for loop methods written in C like Array#each
since the overhead is repeated at every iteration.
Discussions¶
There are a couple of things I'd like to discuss in this ticket:
- @Eregon (Benoit Daloze) has pointed out that there's a race condition in the above implementation.
self[i]
would yieldnil
if the element was removed by another thread or TracePoint afteri < self.length
. Is itArray#each
's responsibility to atomically operate on elements, or are users supposed to avoid mutating the array in the middle of its loop? - If
Integer#<
,Array#length
,Integer#succ
, orArray#[]
is overridden in an incompatible way, the Ruby implementation may not work correctly. May I assume it's acceptable?
Updated by ioquatix (Samuel Williams) about 1 year ago
I think this is a great idea and would like to see more of Ruby implemented in Ruby. The smaller the C implementation becomes, the better, for all the stated reasons, but also additionally, it's cleaner and easier to understand.
Regarding atomicity, I don't think it's the responsibility of the Array#each
implementation to handle this case.
Updated by Eregon (Benoit Daloze) about 1 year ago
or are users supposed to avoid mutating the array in the middle of its loop?
Not supporting to mutate the array while looping would be a pretty big incompatibility as written in https://github.com/ruby/ruby/pull/6687#discussion_r1019493851.
IIRC matz would prefer this to be avoided/undefined, but the reality is such code is used quite a bit and would break.
And in some sense even Array#map! mutates the array while looping over it.
Regarding atomicity on its own though, it seems Rubyists generally expect Array & Hash to be "thread-safe", from my experience of seeing many issues when Array or Hash are not thread-safe (and trying to report/fix most of them).
(incidentally this is also the concurrency guarantees that Python has)
For instance currently even RubyGems relies on thread-safe Hash (and probably Array) too.
A nil
here would be an out-of-thin-air value type of race, which seems quite low-level for Ruby.
IOW I think it would be unexpected by most people that Array#each can yield nil
when there never were any nil
element added to it.
As discussed on the PR I think this can be solved by using a Primitive which does both things atomically.
Or maybe by YJIT intrinsifying the whole Array#each (with side exit for the no block case).
OTOH I think moving more of the core library to Ruby is definitely good/welcome.
But it's hard for fundamental things like Array#each (unlike say Array#map which can be done more easily).
Updated by k0kubun (Takashi Kokubun) about 1 year ago
As discussed on the PR I think this can be solved by using a Primitive which does both things atomically.
After filing this ticket, we built a nice solution using Primitive https://github.com/ruby/ruby/pull/9533. It wasn't straightforward to write multiple values from a single Primitive without allocating an array, which is why I filed this ticket, but we came up with the idea of passing a pointer to local variables to do so.
So discussion (1) is less important now. I'd like to still check on discussion (2) since it would help us "move more of the core library to Ruby" going forward.
Updated by k0kubun (Takashi Kokubun) about 1 year ago
- Description updated (diff)
Updated by matz (Yukihiro Matsumoto) about 1 year ago
Accepted. For the individual concerns:
- I'd like to keep race condition safety to be about the same as the current implementation using primitives, unless it's too difficult.
- If the user intentionally redefines the fundamental method, it should be his/her own responsibility.
Matz.
Updated by k0kubun (Takashi Kokubun) 12 months ago
- Status changed from Open to Closed