Project

General

Profile

Actions

Feature #22094

closed

Speed up Array#join with a byte-copy fast path

Feature #22094: Speed up Array#join with a byte-copy fast path
1

Added by yaroslavmarkin (Yaroslav Markin) 6 days ago. Updated 1 day ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:125637]

Description

Original idea

What if we treat Array#join(" "), Array#join("\n") and Array#join(";") as hot paths (as many, if not most uses of Array#join in the wild), and see if there is anything to optimize for a single-character ASCII joiner?

Detailed

rb_ary_join now works in three steps.

1. Measure (one pass)

Walk the array once to compute the total length (as before) and, at the same time,
track a single flag ascii_only—true while every element's code range is
ENC_CODERANGE_7BIT (and the separator is 7-bit). This is one extra flag read per
element.

If an element is not a String, fall back immediately to the existing
ary_join_0 / ary_join_1 append path (handles to_str, to_s, nested arrays,
etc.). Everything after this point knows all elements are Strings.

2. Decide if the result can be byte-copied

A raw copy is correct only when concatenating the bytes needs no encoding
negotiation. Two cases qualify:

  • All content is 7-bit ASCII (ascii_only). 7-bit bytes are identical under any
    ASCII-compatible encoding, so the result is simply 7-bit in element 0's encoding.
    This also covers 7-bit strings in non-UTF-8 encodings such as Shift_JIS/EUC-JP.

  • Every element shares one fast-path encoding (UTF-8 / US-ASCII / ASCII-8BIT —
    all ASCII-compatible with a 1-byte terminator) and the separator is compatible
    (same encoding, or 7-bit). This is checked in a short second pass that also merges
    the elements' code ranges. It is reached only when the array is not all-7-bit, so
    the common all-ASCII case never pays for it. This case covers all-UTF-8 text.

The merged result code range is: 7BIT if every part is 7-bit, VALID if some part
is clean multibyte, or left UNKNOWN (recomputed lazily) if any part is itself
UNKNOWN/BROKEN. Same-encoding concatenation makes this merge exact.

Anything else—mixed encodings, multibyte Shift_JIS/EUC-JP, an incompatible
separator—is not copyable and uses the existing append path.

3. Copy

Allocate the result once, set its encoding to element 0's, then memcpy each
separator and element into the buffer through a single cursor. Set the length once
and stamp the precomputed code range (or leave it UNKNOWN).

The output is byte-, encoding-, and code-range-identical to the previous
implementation.

Benchmarks

benchmark/array_join.yml and benchmark/array_join_regression.yml, run with
benchmark-driver --repeat-count 4; ratio = new i/s ÷ trunk i/s. MacBook Pro M1 (arm64), macOS.
" " and "\n" separators measured identically; one shown. YJIT on shows the same
ratios.

All-String arrays (the target)

array length \ element bytes 1 8 40
100 2.77x 2.41x 1.98x
1,000 2.91x 2.52x 2.21x
100,000 2.94x 2.57x 2.20x

Other shapes

case ratio
no separator (join) 2.48x
multi-char separator (", ") 2.63x
frozen string elements 2.58x
all-UTF-8 multibyte text 2.07x
UTF-8 multibyte separator 2.18x
7-bit Shift_JIS elements ~5x (trunk uses the slow negotiating path)
large elements, 256 B / 1 KB / 4 KB 2.32x / 1.38x / 1.12x

Fallback cases (must not regress)

case ratio
non-String elements ([1,2,3]) 1.00x
UTF-16 elements 0.99x
nested arrays 1.00x
Strings then a trailing non-String 0.98x

YJIT on (spot check, all-String, separator " ")

case ratio
n=1,000, 1 B 2.93x
n=1,000, 8 B 2.54x
n=100,000, 8 B 2.61x

  • Patch attached.
  • Assisted-by: Opus 4.8.

Files

Actions

Also available in: PDF Atom