Project

General

Profile

Actions

Feature #21042

open

Add and expose Thread#memory_allocations memory allocation counters

Added by stanhu (Stan Hu) 2 months ago. Updated 12 days ago.

Status:
Open
Target version:
-
[ruby-core:120710]

Description

For the last 5 years, we've been patching our Ruby interpreter with https://github.com/ruby/ruby/pull/3978 in order to track memory allocations over time. This has been running in production at GitLab for a long time.

I'd like to request approval for this patch to land upstream since we're getting tired of maintaining this patch, and this data seems like it would be generally useful. If this can be done via a C extension, let me know, and I can look at that.

Copying from that pull request:

Design

This is designed to measure a memory allocations in a multi-threaded environments (concurrent requests processing) with an accurate information about allocated memory within a given execution context.

The idea here is to provide as cheap as possible counter without an overhead of calling callbacks, and provide this information on a per-thread basis.

Implementation

This adds Thread.current.memory_allocations, which provides information about:

  • total_allocated_objects
  • total_malloc_bytes
  • total_mallocs

This is based on a expectation, that allocation for a given thread always happens
with a rb_current_thread() properly indicating a thread performing allocation.
This measures total number of allocations as counters.

Now, since the removal of objects is async and happening at random moment,
the same cannot be said for deallocations, thus only allocations are tracked.

Updated by ko1 (Koichi Sasada) 13 days ago

Matz said he is positive on this proposal.

I have some considerations.

  • Now current proposal doesn't increase much overhead. However I'm afraid that people want to introduce many measurements per threads, or per fibers, and it can be overhead (speed and memory).
  • API
    • It should be similar to GC.stat.
    • Maybe it should be accessible from other threads?

Updated by matz (Yukihiro Matsumoto) 12 days ago

I am positive, but I also feel this API should be implementation dependent. Not all implementations could easily provide this information.

Matz.

Updated by Eregon (Benoit Daloze) 12 days ago

On JVM there is ThreadMXBean#getThreadAllocatedBytes.
There is no way on JVM to know the number of objects allocated, only bytes.

So this means TruffleRuby and JRuby could implement Thread.current.total_allocated_bytes, but not the other ones proposed in the description.

Maybe total_malloc_bytes should be renamed to total_allocated_bytes then, if that also count the bytes for Ruby object allocations.
If that's not the case, maybe CRuby should consider exposing total_allocated_bytes (counting native/malloc + Ruby object allocations in bytes).

BTW https://github.com/ruby/ruby/pull/3978#issuecomment-764731240 mentions

This needs to be enabled with Thread.trace_memory_allocations=true

I think this is good to have. The JVM has the same concept with setThreadAllocatedMemoryEnabled.
Enabling automatically on the first call doesn't seem so nice, because it means no allocations tracked before that.

Also @ioquatix (Samuel Williams) mentioned in https://github.com/ruby/ruby/pull/3978#issuecomment-2705075374 it would be nice to be per Fiber instead of per Thread.
On JRuby and TruffleRuby it would already be per Fiber, since each Fiber uses a java.lang.Thread (or VirtualThread < java.lang.Thread).

Updated by byroot (Jean Boussier) 12 days ago

It should be similar to GC.stat.
Maybe it should be accessible from other threads?

What about something like GC.thread_stats(thread) # => {}. This way it's understood that like GC.stat, the actual statistic are implementation dependent and can be there or not.

Now of course if it is decided that it isn't per thread but per fiber, it would need another name.

However I'm afraid that people want to introduce many measurements per threads, or per fibers, and it can be overhead (speed and memory).

Indeed, if you wanted it on all 3 levels (Ractor/Thread/Fiber) this means 6 extra additions on every object allocation, might actually be significant in such an hot path.

Yet another option could be to have a way to turn this off? Supposedly with branch prediction, checking a boolean first should be very little overhead.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0