Project

General

Profile

Actions

Bug #18886

closed

Struct aref and aset don't trigger any tracepoints.

Added by ioquatix (Samuel Williams) over 1 year ago. Updated 3 months ago.

Status:
Closed
Target version:
-
[ruby-core:109083]

Description

Given the following program, thing.name and thing.shape don't trigger c_call trace points (or any trace points actually).

pp RUBY_VERSION

trace_point = TracePoint.new(:line, :call, :c_call, :a_call) do |trace|
  puts trace.event

  if trace.event == :call
    # Ruby doesn't always mark call-sites in sub-expressions, so we use this approach to compute a call site and mark it:
    if location = caller_locations(2, 1).first and path = location.path
      puts "> #{path}:#{location.lineno}:#{trace.event}"
    end
  end
  
  if path = trace.path
    puts "= #{path}:#{trace.lineno}:#{trace.event}"
  end
end

trace_point.enable

# This will trigger call trace points
class Thing
  def name
    :cat
  end
  
  def shape
    :square
  end
end
thing = Thing.new

# Thing = Struct.new(:name, :shape)
# thing = Thing.new(:cat, :rectangle)

[
  name: thing.name,
  shape: thing.shape,
]

Current HEAD

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:

Proposed PR

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
= ../test.rb:34:c_call:name
= ../test.rb:35:c_call:shape

The reason is the internal implementation of struct doesn't have trace point instrumentation in vm_call_opt_struct_aset or vm_call_opt_struct_aref.

Proposed fix: https://github.com/ruby/ruby/pull/6071 but this would need a review, maybe @jeremyevans0 (Jeremy Evans) and @ko1 (Koichi Sasada) can help.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #16383: TracePoint does not report calls to attribute reader methodsClosedActions
Actions #1

Updated by ioquatix (Samuel Williams) over 1 year ago

  • Description updated (diff)

Updated by ioquatix (Samuel Williams) over 1 year ago

Here is my performance comparison:

Firstly, with no changes (should be identical, shows some variance).

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.515M|  147.470M|
|                       |       1.01x|         -|
|vm_struct_big_aref_lo  |    148.916M|  147.628M|
|                       |       1.01x|         -|
|vm_struct_big_aset     |       7.006|     7.080|
|                       |           -|     1.01x|
|vm_struct_big_href_hi  |     27.652M|   27.382M|
|                       |       1.01x|         -|
|vm_struct_big_href_lo  |     27.547M|   27.725M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.049|     3.094|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    144.031M|  136.167M|
|                       |       1.06x|         -|
|vm_struct_small_aset   |       7.032|     7.063|
|                       |           -|     1.00x|
|vm_struct_small_href   |     30.694M|   30.432M|
|                       |       1.01x|         -|
|vm_struct_small_hset   |     28.590M|   28.083M|
|                       |       1.02x|         -|

Now with my PR:

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    147.665M|  115.510M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    146.666M|  116.298M|
|                       |       1.26x|         -|
|vm_struct_big_aset     |       7.098|     6.801|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.608M|   27.957M|
|                       |           -|     1.01x|
|vm_struct_big_href_lo  |     27.521M|   27.937M|
|                       |           -|     1.02x|
|vm_struct_big_hset     |       3.092|     3.113|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    147.494M|  113.729M|
|                       |       1.30x|         -|
|vm_struct_small_aset   |       6.898|     6.840|
|                       |       1.01x|         -|
|vm_struct_small_href   |     31.004M|   31.547M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.433M|   28.527M|
|                       |           -|     1.00x|

With changes suggested by @ko1 (Koichi Sasada):

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    146.836M|  115.141M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    147.929M|  115.918M|
|                       |       1.28x|         -|
|vm_struct_big_aset     |       7.079|     6.802|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.495M|   27.933M|
|                       |           -|     1.02x|
|vm_struct_big_href_lo  |     27.396M|   27.584M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.128|     3.095|
|                       |       1.01x|         -|
|vm_struct_small_aref   |    142.123M|  113.700M|
|                       |       1.25x|         -|
|vm_struct_small_aset   |       6.995|     6.755|
|                       |       1.04x|         -|
|vm_struct_small_href   |     30.561M|   31.424M|
|                       |           -|     1.03x|
|vm_struct_small_hset   |     28.353M|   28.650M|
|                       |           -|     1.01x|

It seems like 25-30% performance impact, but this is still a very fast operation. Is there any way we can make this faster? Could we have tracing and non-tracing op-codes which call different functions? e.g. vm_call_opt_struct_aref and vm_call_opt_struct_aref_trace for example.

Updated by ioquatix (Samuel Williams) over 1 year ago

With some minor changes the best performance I could get is around 20% cost:

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.051M|  121.849M|
|                       |       1.22x|         -|
|vm_struct_big_aref_lo  |    147.353M|  121.167M|
|                       |       1.22x|         -|
|vm_struct_big_aset     |       7.107|     6.469|
|                       |       1.10x|         -|
|vm_struct_big_href_hi  |     27.081M|   28.269M|
|                       |           -|     1.04x|
|vm_struct_big_href_lo  |     27.642M|   27.630M|
|                       |       1.00x|         -|
|vm_struct_big_hset     |       3.106|     3.150|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    139.761M|  113.250M|
|                       |       1.23x|         -|
|vm_struct_small_aset   |       7.086|     6.775|
|                       |       1.05x|         -|
|vm_struct_small_href   |     30.977M|   31.479M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.565M|   28.554M|
|                       |       1.00x|         -|
Actions #4

Updated by Eregon (Benoit Daloze) over 1 year ago

  • Related to Bug #16383: TracePoint does not report calls to attribute reader methods added

Updated by ioquatix (Samuel Williams) over 1 year ago

Rather than using branch for tracepoints, using a nop/jmp instruction which is set or cleared might be a lower impact option.

e.g. https://nullprogram.com/blog/2016/03/31/

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/8782

Updated by mame (Yusuke Endoh) 4 months ago

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Updated by jeremyevans0 (Jeremy Evans) 4 months ago

mame (Yusuke Endoh) wrote in #note-7:

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Using the benchmark in the pull request, the pull request seems reliably ~5% faster in my testing for the member reader method, but mixed for other cases (sometimes up to 3% faster/slower depending on run). I don't have any explanation as to why it would be faster, so I'm going to assume that the difference is still within the margin of error.

I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.

Updated by ko1 (Koichi Sasada) 3 months ago

Please merge it.

I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.

Showing benchmarking result will help IMO to confirm there is no penalty.

Actions #10

Updated by jeremyevans (Jeremy Evans) 3 months ago

  • Status changed from Open to Closed

Applied in changeset git|3081c83169c55ef7eead6222e49248e09232c22c.


Support tracing of struct member accessor methods

This follows the same approach used for attr_reader/attr_writer in
2d98593bf54a37397c6e4886ccc7e3654c2eaf85, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.

Fixes [Bug #18886]

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0