Project

General

Profile

Actions

Feature #20182

closed

Rewrite Array#each in Ruby

Added by k0kubun (Takashi Kokubun) about 1 month ago. Updated about 1 month ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:116190]

Description

Proposal

Rewrite Array#each in Ruby https://github.com/ruby/ruby/pull/6687.

class Array
  def each
    unless block_given?
      return to_enum(:each) { self.length }
    end
    i = 0
    while i < self.length
      yield self[i]
      i = i.succ
    end
    self
  end
end

Purpose

Make it possible for YJIT to optimize ISEQs across Array#each.

Background

Whether JIT-compiled or not, calling Ruby from C is more expensive than calling Ruby from Ruby. It also prevents YJIT from making cross-ISEQ optimizations.

This is problematic especially for loop methods written in C like Array#each since the overhead is repeated at every iteration.

Discussions

There are a couple of things I'd like to discuss in this ticket:

  1. @Eregon (Benoit Daloze) has pointed out that there's a race condition in the above implementation. self[i] would yield nil if the element was removed by another thread or TracePoint after i < self.length. Is it Array#each's responsibility to atomically operate on elements, or are users supposed to avoid mutating the array in the middle of its loop?
  2. If Integer#<, Array#length, Integer#succ, or Array#[] is overridden in an incompatible way, the Ruby implementation may not work correctly. May I assume it's acceptable?

Updated by ioquatix (Samuel Williams) about 1 month ago

I think this is a great idea and would like to see more of Ruby implemented in Ruby. The smaller the C implementation becomes, the better, for all the stated reasons, but also additionally, it's cleaner and easier to understand.

Regarding atomicity, I don't think it's the responsibility of the Array#each implementation to handle this case.

Updated by Eregon (Benoit Daloze) about 1 month ago

or are users supposed to avoid mutating the array in the middle of its loop?

Not supporting to mutate the array while looping would be a pretty big incompatibility as written in https://github.com/ruby/ruby/pull/6687#discussion_r1019493851.
IIRC matz would prefer this to be avoided/undefined, but the reality is such code is used quite a bit and would break.
And in some sense even Array#map! mutates the array while looping over it.

Regarding atomicity on its own though, it seems Rubyists generally expect Array & Hash to be "thread-safe", from my experience of seeing many issues when Array or Hash are not thread-safe (and trying to report/fix most of them).
(incidentally this is also the concurrency guarantees that Python has)
For instance currently even RubyGems relies on thread-safe Hash (and probably Array) too.

A nil here would be an out-of-thin-air value type of race, which seems quite low-level for Ruby.
IOW I think it would be unexpected by most people that Array#each can yield nil when there never were any nil element added to it.

As discussed on the PR I think this can be solved by using a Primitive which does both things atomically.
Or maybe by YJIT intrinsifying the whole Array#each (with side exit for the no block case).

OTOH I think moving more of the core library to Ruby is definitely good/welcome.
But it's hard for fundamental things like Array#each (unlike say Array#map which can be done more easily).

Updated by k0kubun (Takashi Kokubun) about 1 month ago

As discussed on the PR I think this can be solved by using a Primitive which does both things atomically.

After filing this ticket, we built a nice solution using Primitive https://github.com/ruby/ruby/pull/9533. It wasn't straightforward to write multiple values from a single Primitive without allocating an array, which is why I filed this ticket, but we came up with the idea of passing a pointer to local variables to do so.

So discussion (1) is less important now. I'd like to still check on discussion (2) since it would help us "move more of the core library to Ruby" going forward.

Actions #4

Updated by k0kubun (Takashi Kokubun) about 1 month ago

  • Description updated (diff)

Updated by matz (Yukihiro Matsumoto) about 1 month ago

Accepted. For the individual concerns:

  1. I'd like to keep race condition safety to be about the same as the current implementation using primitives, unless it's too difficult.
  2. If the user intentionally redefines the fundamental method, it should be his/her own responsibility.

Matz.

Actions #6

Updated by k0kubun (Takashi Kokubun) about 1 month ago

  • Status changed from Open to Closed
Actions

Also available in: Atom PDF

Like2
Like1Like1Like0Like0Like0Like0