Bug #18886
openStruct aref and aset don't trigger any tracepoints.
Description
Given the following program, thing.name
and thing.shape
don't trigger c_call
trace points (or any trace points actually).
pp RUBY_VERSION
trace_point = TracePoint.new(:line, :call, :c_call, :a_call) do |trace|
puts trace.event
if trace.event == :call
# Ruby doesn't always mark call-sites in sub-expressions, so we use this approach to compute a call site and mark it:
if location = caller_locations(2, 1).first and path = location.path
puts "> #{path}:#{location.lineno}:#{trace.event}"
end
end
if path = trace.path
puts "= #{path}:#{trace.lineno}:#{trace.event}"
end
end
trace_point.enable
# This will trigger call trace points
class Thing
def name
:cat
end
def shape
:square
end
end
thing = Thing.new
# Thing = Struct.new(:name, :shape)
# thing = Thing.new(:cat, :rectangle)
[
name: thing.name,
shape: thing.shape,
]
Current HEAD¶
= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
Proposed PR¶
= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
= ../test.rb:34:c_call:name
= ../test.rb:35:c_call:shape
The reason is the internal implementation of struct doesn't have trace point instrumentation in vm_call_opt_struct_aset
or vm_call_opt_struct_aref
.
Proposed fix: https://github.com/ruby/ruby/pull/6071 but this would need a review, maybe @jeremyevans0 (Jeremy Evans) and @ko1 (Koichi Sasada) can help.
Updated by ioquatix (Samuel Williams) over 1 year ago
- Description updated (diff)
Updated by ioquatix (Samuel Williams) over 1 year ago
Here is my performance comparison:
Firstly, with no changes (should be identical, shows some variance).
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 149.515M| 147.470M|
| | 1.01x| -|
|vm_struct_big_aref_lo | 148.916M| 147.628M|
| | 1.01x| -|
|vm_struct_big_aset | 7.006| 7.080|
| | -| 1.01x|
|vm_struct_big_href_hi | 27.652M| 27.382M|
| | 1.01x| -|
|vm_struct_big_href_lo | 27.547M| 27.725M|
| | -| 1.01x|
|vm_struct_big_hset | 3.049| 3.094|
| | -| 1.01x|
|vm_struct_small_aref | 144.031M| 136.167M|
| | 1.06x| -|
|vm_struct_small_aset | 7.032| 7.063|
| | -| 1.00x|
|vm_struct_small_href | 30.694M| 30.432M|
| | 1.01x| -|
|vm_struct_small_hset | 28.590M| 28.083M|
| | 1.02x| -|
Now with my PR:
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 147.665M| 115.510M|
| | 1.28x| -|
|vm_struct_big_aref_lo | 146.666M| 116.298M|
| | 1.26x| -|
|vm_struct_big_aset | 7.098| 6.801|
| | 1.04x| -|
|vm_struct_big_href_hi | 27.608M| 27.957M|
| | -| 1.01x|
|vm_struct_big_href_lo | 27.521M| 27.937M|
| | -| 1.02x|
|vm_struct_big_hset | 3.092| 3.113|
| | -| 1.01x|
|vm_struct_small_aref | 147.494M| 113.729M|
| | 1.30x| -|
|vm_struct_small_aset | 6.898| 6.840|
| | 1.01x| -|
|vm_struct_small_href | 31.004M| 31.547M|
| | -| 1.02x|
|vm_struct_small_hset | 28.433M| 28.527M|
| | -| 1.00x|
With changes suggested by @ko1 (Koichi Sasada):
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 146.836M| 115.141M|
| | 1.28x| -|
|vm_struct_big_aref_lo | 147.929M| 115.918M|
| | 1.28x| -|
|vm_struct_big_aset | 7.079| 6.802|
| | 1.04x| -|
|vm_struct_big_href_hi | 27.495M| 27.933M|
| | -| 1.02x|
|vm_struct_big_href_lo | 27.396M| 27.584M|
| | -| 1.01x|
|vm_struct_big_hset | 3.128| 3.095|
| | 1.01x| -|
|vm_struct_small_aref | 142.123M| 113.700M|
| | 1.25x| -|
|vm_struct_small_aset | 6.995| 6.755|
| | 1.04x| -|
|vm_struct_small_href | 30.561M| 31.424M|
| | -| 1.03x|
|vm_struct_small_hset | 28.353M| 28.650M|
| | -| 1.01x|
It seems like 25-30% performance impact, but this is still a very fast operation. Is there any way we can make this faster? Could we have tracing and non-tracing op-codes which call different functions? e.g. vm_call_opt_struct_aref
and vm_call_opt_struct_aref_trace
for example.
Updated by ioquatix (Samuel Williams) over 1 year ago
With some minor changes the best performance I could get is around 20% cost:
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 149.051M| 121.849M|
| | 1.22x| -|
|vm_struct_big_aref_lo | 147.353M| 121.167M|
| | 1.22x| -|
|vm_struct_big_aset | 7.107| 6.469|
| | 1.10x| -|
|vm_struct_big_href_hi | 27.081M| 28.269M|
| | -| 1.04x|
|vm_struct_big_href_lo | 27.642M| 27.630M|
| | 1.00x| -|
|vm_struct_big_hset | 3.106| 3.150|
| | -| 1.01x|
|vm_struct_small_aref | 139.761M| 113.250M|
| | 1.23x| -|
|vm_struct_small_aset | 7.086| 6.775|
| | 1.05x| -|
|vm_struct_small_href | 30.977M| 31.479M|
| | -| 1.02x|
|vm_struct_small_hset | 28.565M| 28.554M|
| | 1.00x| -|
Updated by Eregon (Benoit Daloze) over 1 year ago
- Related to Bug #16383: TracePoint does not report calls to attribute reader methods added
Updated by ioquatix (Samuel Williams) about 1 year ago
Rather than using branch for tracepoints, using a nop/jmp
instruction which is set or cleared might be a lower impact option.
Updated by jeremyevans0 (Jeremy Evans) about 1 month ago
I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/8782
Updated by mame (Yusuke Endoh) 4 days ago
@jeremyevans0 (Jeremy Evans) Could you measure the overhead?
Updated by jeremyevans0 (Jeremy Evans) 3 days ago
mame (Yusuke Endoh) wrote in #note-7:
@jeremyevans0 (Jeremy Evans) Could you measure the overhead?
Using the benchmark in the pull request, the pull request seems reliably ~5% faster in my testing for the member reader method, but mixed for other cases (sometimes up to 3% faster/slower depending on run). I don't have any explanation as to why it would be faster, so I'm going to assume that the difference is still within the margin of error.
I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.