Feature #18559
Updated by byroot (Jean Boussier) over 2 years ago
Marking this as a feature, because I think it should be improved but can hardly be considered a bug.
### Repro
Consider the following script:
```ruby
# /tmp/allocation-source.rb
require 'objspace'
require 'tmpdir'
source = File.join(Dir.tmpdir, "foo.rb")
File.write(source, <<~RUBY)
# frozen_string_literal: true
class Foo
def plop
"fizz"
end
end
RUBY
ObjectSpace.trace_object_allocations_start
GC.start
gen = GC.count
require(source)
ObjectSpace.dump_all(output: $stdout, since: gen)
```
### Expected behavior
I'd expect the `ObjectSpace.dump_all` output to attribute all new objects, including `T_IMEMO` etc, to `foo.rb`
### Actual behavior
They are attributed to the source file that called `Kernel.require` (so with `--disable-gems`):
```
{"address":"0x11acaec78", "type":"CLASS", "class":"0x11acaebb0", "superclass":"0x10fa4a848", "name":"Foo", "references":["0x10fa4a848", "0x11acaea98", "0x11acaf790"], "file":"/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "line":2, "generation":1, "memsize":544, "flags":{"wb_protected":true}}
{"address":"0x11acaeca0", "type":"IMEMO", "class":"0x8", "imemo_type":"cref", "references":["0x10fa4a848"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaecc8", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":4, "value":"fizz", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaecf0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaed18", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaecc8", "0x11acaf600", "0x11acaf600", "0x11acaecf0"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":416, "flags":{"wb_protected":true}}
{"address":"0x11acaf1a0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf1c8", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaed18", "0x11acaf1f0", "0x11acaf1f0", "0x11acaf1a0", "0x11acaf290"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":456, "flags":{"wb_protected":true}}
{"address":"0x11acaf1f0", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf218", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x11acaf240", "type":"STRING", "class":"0x10fa42418", "frozen":true, "fstring":true, "bytesize":63, "value":"/private/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":104, "flags":{"wb_protected":true}}
....
```
### Why is it a problem?
This behavior makes it impossible to properly analyze which part of an application use the most memory. For instance when using `heap-profiler` on an app using `Bootsnap`, all objects created as a result of loading source file are attributed to bootsnap:
```
retained memory by gem
-----------------------------------
351.64 MB bootsnap-1.10.2
```
If this behaved as I expect, `heap-profiler` would be able to report how much each gem contribute to the app RAM usage.
### Possible solution
I think `ObjectSpace` should have an API to override `get_trace_arg() / EC->trace_arg`, in the context of allocation tracing, so that `Kernel.require` and `RubyVM::InstructionSequence.load_from_binary` could set it to the source file they're loading.
### Additional use cases?
A very similar issue is with objects created by static data parsers such as `YAML`, `JSON` etc. All the objects they created as part of the parsing is attributed to them.
So it would very useful if there was a Ruby API so that we could do something like this:
```ruby
module YAMLAllocationTracing
def load_file(path, ...)
ObjectSpace.set_allocation_source(file: path, line: 1, class_path: :YAML, method_id: :load_file) do
super
end
end
end
YAML.singleton_class.prepend(YAMLAllocationTracing)
```