Project

General

Profile

Actions

Bug #20695

closed

Elevated GC allocations in parse.y parser

Added by alanwu (Alan Wu) 3 months ago. Updated 3 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
[ruby-core:118939]

Description

Testing on the lobsters benchmark from yjit-bench, the latest master parser is allocating more objects compared to 98eeadc932 ("Development of 3.4.0 started."). The following patch shows allocation count after code loading:

diff --git a/benchmarks/lobsters/benchmark.rb b/benchmarks/lobsters/benchmark.rb
index 240c50c..6cdd0ac 100644
--- a/benchmarks/lobsters/benchmark.rb
+++ b/benchmarks/lobsters/benchmark.rb
@@ -7,6 +7,8 @@ Dir.chdir __dir__
 use_gemfile
 
 require_relative 'config/environment'
+printf "allocated_after_load=%d\n", GC.stat(:total_allocated_objects)
+exit
 require_relative "route_generator"
 
 # For an in-mem DB, we need to load all data on every boot
$ ruby benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2023-12-25T09:13:40Z master 98eeadc932) [arm64-darwin23]
<snip>
allocated_after_load=1747084
$ chruby master
$ ruby benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
<snip>
allocated_after_load=2163031
$ ruby --parser=prism benchmarks/lobsters/benchmark.rb
ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) +PRISM [arm64-darwin23]
<snip>
allocated_after_load=932571
$ math 2163031 / 1747084
1.238081

Profiling shows extra GC allocations coming from set_number_literal(); set_yylval_node() expands to a call to rb_enc_str_new(). So this issue seems related to #20659. A surprising side effect of these extra allocations is that they seem to impact the speed of the benchmark, not only the loading speed, even though the body of benchmark does not do any Ruby parsing after warmup iterations. The speed impact seem to be due to poorer data locality, as perf(1) shows elevated levels of CYCLE_ACTIVITY.STALLS_L2_MISS and related events. Using Prism speeds up benchmark iterations, probably since it allocates a lot fewer during code loading, leaving the heap in a better layout in terms of locality. With Prism, fewer cycles stall on data cache misses.

$ ruby run_benchmarks.rb --chruby 'master;mstr+p::master --parser=prism' lobsters
<snip>
master: ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) [arm64-darwin23]
mstr+p: ruby 3.4.0dev (2024-08-23T14:49:27Z master 3f6be01bfc) +PRISM [arm64-darwin23]

--------  -----------  ----------  -----------  ----------  --------------  -------------
bench     master (ms)  stddev (%)  mstr+p (ms)  stddev (%)  mstr+p 1st itr  master/mstr+p
lobsters  567.9        0.7         557.0        0.8         1.00            1.02         
--------  -----------  ----------  -----------  ----------  --------------  -------------
Legend:
- mstr+p 1st itr: ratio of master/mstr+p time for the first benchmarking iteration.
- master/mstr+p: ratio of master/mstr+p time. Higher is better for mstr+p. Above 1 represents a speedup.

In hindsight, it's clear how that the parser's allocation pattern can have a significant impact on heap layout, since typical app boot with Kernel#require alternates between running app code and running the parser. Parser allocations essentially end up as gaps between app objects.

Actions

Also available in: Atom PDF

Like2
Like0