Project

General

Profile

Actions

Feature #18559

open

Allocation tracing: Objects created by the parser are attributed to Kernel.require

Added by byroot (Jean Boussier) 5 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:107389]

Description

Marking this as a feature, because I think it should be improved but can hardly be considered a bug.

Repro

Consider the following script:

# /tmp/allocation-source.rb
require 'objspace'
require 'tmpdir'

source = File.join(Dir.tmpdir, "foo.rb")
File.write(source, <<~RUBY)
  # frozen_string_literal: true
  class Foo
    def plop
      "fizz"
    end
  end
RUBY

ObjectSpace.trace_object_allocations_start

GC.start
gen = GC.count
require(source)
ObjectSpace.dump_all(output: $stdout, since: gen)

Expected behavior

I'd expect the ObjectSpace.dump_all output to attribute all new objects, including T_IMEMO etc, to foo.rb

Actual behavior

They are attributed to the source file that called Kernel.require (so with --disable-gems):

{"address":"0x11acaec78", "type":"CLASS", "class":"0x11acaebb0", "superclass":"0x10fa4a848", "name":"Foo", "references":["0x10fa4a848", "0x11acaea98", "0x11acaf790"], "file":"/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "line":2, "generation":1, "memsize":544, "flags":{"wb_protected":true}}
{"address":"0x11acaeca0", "type":"IMEMO", "class":"0x8", "imemo_type":"cref", "references":["0x10fa4a848"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaecc8", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":4, "value":"fizz", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaecf0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaed18", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaecc8", "0x11acaf600", "0x11acaf600", "0x11acaecf0"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":416, "flags":{"wb_protected":true}}
{"address":"0x11acaf1a0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf1c8", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaed18", "0x11acaf1f0", "0x11acaf1f0", "0x11acaf1a0", "0x11acaf290"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":456, "flags":{"wb_protected":true}}
{"address":"0x11acaf1f0", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf218", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf240", "type":"STRING", "class":"0x10fa42418", "frozen":true, "fstring":true, "bytesize":63, "value":"/private/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":104, "flags":{"wb_protected":true}}
....

Why is it a problem?

This behavior makes it impossible to properly analyze which part of an application use the most memory. For instance when using heap-profiler on an app using Bootsnap, all objects created as a result of loading source file are attributed to bootsnap:

retained memory by gem
-----------------------------------
 351.64 MB  bootsnap-1.10.2

If this behaved as I expect, heap-profiler would be able to report how much each gem contribute to the app RAM usage.

Possible solution

I think ObjectSpace should have an API to override get_trace_arg() / EC->trace_arg, in the context of allocation tracing, so that Kernel.require and RubyVM::InstructionSequence.load_from_binary could set it to the source file they're loading.

Additional use cases?

A very similar issue is with objects created by static data parsers such as YAML, JSON etc. All the objects they created as part of the parsing is attributed to them.

So it would very useful if there was a Ruby API so that we could do something like this:

module YAMLAllocationTracing
 def load_file(path, ...)
   ObjectSpace.set_allocation_source(file: path, line: 1, class_path: :YAML, method_id: :load_file) do
     super
   end
  end
end
YAML.singleton_class.prepend(YAMLAllocationTracing)

No data to display

Actions

Also available in: Atom PDF