Project

General

Profile

Actions

Feature #20902

closed

Allow `IO::Buffer#copy` to release the GVL.

Added by ioquatix (Samuel Williams) 4 days ago. Updated 3 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:119972]

Description

Related to https://bugs.ruby-lang.org/issues/20876.

Background

IO::Buffer#copy execution time is proportional to the length of the data copied. As such, large copies can take a long time (100ms+). Currently, the GVL is not released, which can stall the Ruby interpreter.

Proposal

Pull Request: https://github.com/ruby/ruby/pull/12021

If the size of the data to be copied is larger than a specific amount (heuristic), we will perform memmove using rb_nogvl.

The initial size heuristic is set to 1MiB. This won't be perfect for every system, but should be good enough to avoid ms+ stalls.

Results

I measured the difference:

GVL Threads Buffer Size Total Duration Throughput (MB/s)
Yes 1 1 0.12ms 8393.09
Yes 1 5 0.51ms 9857.7
Yes 1 10 1.12ms 8937.54
Yes 1 20 2.22ms 9015.95
Yes 2 1 0.24ms 8307.07
Yes 2 5 1.13ms 8819.58
Yes 2 10 1.49ms 13385.35
Yes 2 20 5.63ms 7110.8
Yes 4 1 0.92ms 4360.18
Yes 4 5 2.08ms 9606.58
Yes 4 10 4.51ms 8863.13
Yes 4 20 9.3ms 8601.41
Yes 8 1 1.22ms 6574.93
Yes 8 5 3.56ms 11239.27
Yes 8 10 7.31ms 10943.68
Yes 8 20 15.57ms 10274.99
Yes 16 1 1.95ms 8220.16
Yes 16 5 5.51ms 14518.05
Yes 16 10 13.77ms 11618.96
Yes 16 20 27.21ms 11759.43
Yes 32 1 3.24ms 9891.05
Yes 32 5 11.42ms 14007.41
Yes 32 10 21.64ms 14786.48
Yes 32 20 45.52ms 14060.25
No 1 1 0.13ms 7582.85
No 1 5 0.44ms 11248.55
No 1 10 1.11ms 9029.91
No 1 20 2.43ms 8228.42
No 2 1 0.18ms 11245.61
No 2 5 0.96ms 10396.76
No 2 10 1.9ms 10501.59
No 2 20 3.16ms 12656.77
No 4 1 0.69ms 5827.76
No 4 5 1.15ms 17440.54
No 4 10 2.31ms 17307.79
No 4 20 4.11ms 19483.68
No 8 1 0.67ms 11954.1
No 8 5 1.3ms 30713.68
No 8 10 2.05ms 38990.98
No 8 20 4.15ms 38552.37
No 16 1 0.96ms 16698.03
No 16 5 1.46ms 54782.47
No 16 10 2.74ms 58295.64
No 16 20 4.89ms 65482.43
No 32 1 1.82ms 17554.27
No 32 5 2.68ms 59673.59
No 32 10 3.87ms 82733.34
No 32 20 6.93ms 92297.47

In the base case, the performance is about the same, but in the best case, the throughput is significantly better: 15GiB/s vs 92GiB/s (32 threads copying 20MiB of data).

Actions

Also available in: Atom PDF

Like0
Like0Like0