Project

General

Profile

Actions

Feature #18494

open

[RFC] ENV["RUBY_GC_..."]= changes GC parameters dynamically

Added by normalperson (Eric Wong) 4 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:107147]

Description

This is intended to give Ruby application developers a way to to
improve the out-of-the-box experience for end users running
tools written in Ruby. In most cases, end users are not and
cannot be expected to know how to tune the GC better than the
developers who wrote the Ruby code.

This has no extra API footprint, and will silently be a no-op
for other Ruby implementations.

One potential incompatibility is users doing something like:

ENV["RUBY_GC_..."] = "1m"
system(...)

However, the different behavior would be largely innocuous aside from
different performance characteristics in the parent process. Using:

system({ "RUBY_GC_..." => "1m" }, ...)

...would restore the previous behavior (and is generally the
preferred usage, anyways, to avoid thread-safety issues).

RFC since I've only tested this with RUBY_GC_MALLOC_LIMIT and
RUBY_GC_MALLOC_LIMIT_MAX, so far. I've yet to check Ractor
interactions since haven't followed Ruby in several years.

I made this change to reduce memory use in a single-threaded
pipeline+process manager designed for audio playback; but it
probably makes sense for many long-running daemons that want
to clamp memory use after all code is loaded.

Note: I can't create Redmine tickets due to MFA: [ruby-core:105878].
I completely disagree with MFA for Open Source contributions as it's a
needless barrier to participation. Open Source worked fine for decades
without MFA. I show you my code and even explain my changes; but nobody
here knows me and nobody ever will. I don't want nor need anyone to
trust me when they can read my code and even ask me to clarify things
if needed.

 hash.c               | 5 +++++
 test/ruby/test_gc.rb | 4 ++++
 2 files changed, 9 insertions(+)
 
 diff --git a/hash.c b/hash.c
 index f032ef642a..d7cc797ef5 100644
 --- a/hash.c
 +++ b/hash.c
 @@ -4911,6 +4911,7 @@ static VALUE env_aset(VALUE nm, VALUE val);
 static void
 reset_by_modified_env(const char *nam)
 {
 +    static char gc_var_pfx[] = "RUBY_GC_";
 /*
  * ENV['TZ'] = nil has a special meaning.
  * TZ is no longer considered up-to-date and ruby call tzset() as needed.
 @@ -4919,6 +4920,10 @@ reset_by_modified_env(const char *nam)
  */
 if (ENVMATCH(nam, TZ_ENV)) {
 ruby_reset_timezone();
 +    } else if (ENVNMATCH(nam, gc_var_pfx, sizeof(gc_var_pfx) - 1)) {
 +        ENV_LOCK();
 +        ruby_gc_set_params();
 +        ENV_UNLOCK();
 }
 }
 
 diff --git a/test/ruby/test_gc.rb b/test/ruby/test_gc.rb
 index 788f2974b5..5fd5924fb3 100644
 --- a/test/ruby/test_gc.rb
 +++ b/test/ruby/test_gc.rb
 @@ -334,6 +334,10 @@ def test_gc_parameter
 assert_in_out_err([env, "-w", "-e", "exit"], "", [], /RUBY_GC_OLDMALLOC_LIMIT_MAX=16000000/, "")
 assert_in_out_err([env, "-w", "-e", "exit"], "", [], /RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR=2.0/, "")
 end
 +
 +    assert_in_out_err(["-w", "-e", <<-'end'], "", [], /RUBY_GC_MALLOC_LIMIT=1024/, "")
 +      ENV['RUBY_GC_MALLOC_LIMIT'] = '1k'
 +    end
 end
 
 def test_profiler_enabled

Files

Actions #1

Updated by ko1 (Koichi Sasada) 4 months ago

  • Description updated (diff)

Updated by ko1 (Koichi Sasada) 4 months ago

Some RUBY_GC_... vars do not affect correctly because they are used only on setup.
We need to document which vars can be modified dynamically.

Updated by byroot (Jean Boussier) 4 months ago

Some RUBY_GC_... vars do not affect correctly because they are used only on setup.

Should we allow to change them at runtime through some ::GC API? I could see some automatic/dynamic GC tuning gems being implemented using these APIs. e.g. monitor how often and how long the GC runs, and tweak some values in response.

Updated by normalperson (Eric Wong) 4 months ago

https://bugs.ruby-lang.org/issues/18494

Thanks both for the comments.

"ko1 (Koichi Sasada)" wrote:

Some RUBY_GC_... vars do not affect correctly because they are used only on setup.
We need to document which vars can be modified dynamically.

Updated patch series attached, patch 3/3 adds partial guard to
RUBY_GC_INIT_HEAP_SLOTS to prevent gc_set_initial_pages().

Patch 1/3 also fixes an existing bug in how RUBY_GC_MALLOC_LIMIT
that is independent of this new feature.

All the other parameters seem fine, however I'm not sure about
interactions w.r.t. Ractors and double-precision floats.

"byroot (Jean Boussier)" wrote:

Should we allow to change them at runtime through some ::GC
API? I could see some automatic/dynamic GC tuning gems being
implemented using these APIs. e.g. monitor how often and how
long the GC runs, and tweak some values in response.

I'm rather strongly against this. I envision use would be:

  1. load a bunch of code, read configs, etc...
  2. set GC params
  3. run main loop until exit

The main loop in reasonably-written applications should be
highly predictable in terms of memory use and have minimal
outliers and spikes. IOW, a web server should be serving
reasonably-sized HTML pages/chunks that clients can render w/o
falling over; and reading/sending large files in chunks rather
than slurping hundreds of MB into RAM.

For a rare memory spikes inside the main loop, I'd much rather
trigger GC ASAP than be saddled with a larger heap for the
remainder of process lifetime (because larger heaps increase
potential for fragmentation).

I see the presence of tuning knobs to be an admission of the
shortcomings of the GC and something that could/should
eventually be eliminated via GC improvements. (Fwiw, I've
mostly given up on Ruby and it's GC; but I made this patch since
it's too time-consuming to rewrite some Ruby in a scripting
language with auto ref-counting and faster startup time).

Anyways, I've updated the commit message in my original patch
(now patch 2/3) to further explain my decision. Here's the
relevant snippet:

This has no extra API footprint, and will silently be a no-op for other
Ruby implementations. I tried to make this change as non-intrusive as
possible to minimize the growth in executable and icache size. It is
not optimized for repeated changes and (IMHO) should not be. IMHO,
tuning knobs should be last resorts and used sparingly to minimize the
learning curve and cognitive overhead required to run applications.

Thus there is no "GC.foo=" API to reduce documentation and support
overhead, since GC parameters are implementation-dependent and may
change over time. Developers can make ENV changes freely without
worrying about forward, backwards, nor cross-engine incompatibilities.
These parameters are already documented in the manpage and other
sources, so there's less cognitive overhead required to learn them.

Increasing the Ruby API footprint would also hurt startup time and
increases memory usage.

Available as a self-hosted pull request (generated via "git request-pull"):

The following changes since commit eb98275c967d8938526966fe53e52f5a10249492:

  • 2022-01-18 [ci skip] (2022-01-18 05:39:51 +0900)

are available in the Git repository at:

https://yhbt.net/ruby.git gc-param-dyn-v2

for you to fetch changes up to ffb336a8ddb0e21d8f3bb4ce1801e90ea78af42d:

ruby_gc_set_params: do not set initial pages on dynamic update (2022-01-17 22:59:58 +0000)


Eric Wong (3):
ruby_gc_set_params: update malloc_limit when env is set
ENV["RUBY_GC_..."]= changes GC parameters dynamically
ruby_gc_set_params: do not set initial pages on dynamic update

gc.c | 11 +++++++----
hash.c | 5 +++++
internal/gc.h | 2 +-
ruby.c | 2 +-
test/ruby/test_gc.rb | 4 ++++
5 files changed, 18 insertions(+), 6 deletions(-)

Thanks for reading this far.

Actions

Also available in: Atom PDF