Feature #22094
closedSpeed up Array#join with a byte-copy fast path
Description
Original idea¶
What if we treat Array#join(" "), Array#join("\n") and Array#join(";") as hot paths (as many, if not most uses of Array#join in the wild), and see if there is anything to optimize for a single-character ASCII joiner?
Detailed¶
rb_ary_join now works in three steps.
1. Measure (one pass)¶
Walk the array once to compute the total length (as before) and, at the same time,
track a single flag ascii_only—true while every element's code range is
ENC_CODERANGE_7BIT (and the separator is 7-bit). This is one extra flag read per
element.
If an element is not a String, fall back immediately to the existing
ary_join_0 / ary_join_1 append path (handles to_str, to_s, nested arrays,
etc.). Everything after this point knows all elements are Strings.
2. Decide if the result can be byte-copied¶
A raw copy is correct only when concatenating the bytes needs no encoding
negotiation. Two cases qualify:
-
All content is 7-bit ASCII (
ascii_only). 7-bit bytes are identical under any
ASCII-compatible encoding, so the result is simply 7-bit in element 0's encoding.
This also covers 7-bit strings in non-UTF-8 encodings such as Shift_JIS/EUC-JP. -
Every element shares one fast-path encoding (UTF-8 / US-ASCII / ASCII-8BIT —
all ASCII-compatible with a 1-byte terminator) and the separator is compatible
(same encoding, or 7-bit). This is checked in a short second pass that also merges
the elements' code ranges. It is reached only when the array is not all-7-bit, so
the common all-ASCII case never pays for it. This case covers all-UTF-8 text.
The merged result code range is: 7BIT if every part is 7-bit, VALID if some part
is clean multibyte, or left UNKNOWN (recomputed lazily) if any part is itself
UNKNOWN/BROKEN. Same-encoding concatenation makes this merge exact.
Anything else—mixed encodings, multibyte Shift_JIS/EUC-JP, an incompatible
separator—is not copyable and uses the existing append path.
3. Copy¶
Allocate the result once, set its encoding to element 0's, then memcpy each
separator and element into the buffer through a single cursor. Set the length once
and stamp the precomputed code range (or leave it UNKNOWN).
The output is byte-, encoding-, and code-range-identical to the previous
implementation.
Benchmarks¶
benchmark/array_join.yml and benchmark/array_join_regression.yml, run with
benchmark-driver --repeat-count 4; ratio = new i/s ÷ trunk i/s. MacBook Pro M1 (arm64), macOS.
" " and "\n" separators measured identically; one shown. YJIT on shows the same
ratios.
All-String arrays (the target)¶
| array length \ element bytes | 1 | 8 | 40 |
|---|---|---|---|
| 100 | 2.77x | 2.41x | 1.98x |
| 1,000 | 2.91x | 2.52x | 2.21x |
| 100,000 | 2.94x | 2.57x | 2.20x |
Other shapes¶
| case | ratio |
|---|---|
no separator (join) |
2.48x |
multi-char separator (", ") |
2.63x |
| frozen string elements | 2.58x |
| all-UTF-8 multibyte text | 2.07x |
| UTF-8 multibyte separator | 2.18x |
| 7-bit Shift_JIS elements | ~5x (trunk uses the slow negotiating path) |
| large elements, 256 B / 1 KB / 4 KB | 2.32x / 1.38x / 1.12x |
Fallback cases (must not regress)¶
| case | ratio |
|---|---|
non-String elements ([1,2,3]) |
1.00x |
| UTF-16 elements | 0.99x |
| nested arrays | 1.00x |
| Strings then a trailing non-String | 0.98x |
YJIT on (spot check, all-String, separator " ")¶
| case | ratio |
|---|---|
| n=1,000, 1 B | 2.93x |
| n=1,000, 8 B | 2.54x |
| n=100,000, 8 B | 2.61x |
- Patch attached.
- Assisted-by: Opus 4.8.
Files