Project

General

Profile

Actions

Bug #18886

open

Struct aref and aset don't trigger any tracepoints.

Added by ioquatix (Samuel Williams) over 1 year ago. Updated 3 days ago.

Status:
Open
Priority:
Normal
Target version:
-
[ruby-core:109083]

Description

Given the following program, thing.name and thing.shape don't trigger c_call trace points (or any trace points actually).

pp RUBY_VERSION

trace_point = TracePoint.new(:line, :call, :c_call, :a_call) do |trace|
  puts trace.event

  if trace.event == :call
    # Ruby doesn't always mark call-sites in sub-expressions, so we use this approach to compute a call site and mark it:
    if location = caller_locations(2, 1).first and path = location.path
      puts "> #{path}:#{location.lineno}:#{trace.event}"
    end
  end
  
  if path = trace.path
    puts "= #{path}:#{trace.lineno}:#{trace.event}"
  end
end

trace_point.enable

# This will trigger call trace points
class Thing
  def name
    :cat
  end
  
  def shape
    :square
  end
end
thing = Thing.new

# Thing = Struct.new(:name, :shape)
# thing = Thing.new(:cat, :rectangle)

[
  name: thing.name,
  shape: thing.shape,
]

Current HEAD

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:

Proposed PR

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
= ../test.rb:34:c_call:name
= ../test.rb:35:c_call:shape

The reason is the internal implementation of struct doesn't have trace point instrumentation in vm_call_opt_struct_aset or vm_call_opt_struct_aref.

Proposed fix: https://github.com/ruby/ruby/pull/6071 but this would need a review, maybe @jeremyevans0 (Jeremy Evans) and @ko1 (Koichi Sasada) can help.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #16383: TracePoint does not report calls to attribute reader methodsClosedActions
Actions #1

Updated by ioquatix (Samuel Williams) over 1 year ago

  • Description updated (diff)

Updated by ioquatix (Samuel Williams) over 1 year ago

Here is my performance comparison:

Firstly, with no changes (should be identical, shows some variance).

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.515M|  147.470M|
|                       |       1.01x|         -|
|vm_struct_big_aref_lo  |    148.916M|  147.628M|
|                       |       1.01x|         -|
|vm_struct_big_aset     |       7.006|     7.080|
|                       |           -|     1.01x|
|vm_struct_big_href_hi  |     27.652M|   27.382M|
|                       |       1.01x|         -|
|vm_struct_big_href_lo  |     27.547M|   27.725M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.049|     3.094|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    144.031M|  136.167M|
|                       |       1.06x|         -|
|vm_struct_small_aset   |       7.032|     7.063|
|                       |           -|     1.00x|
|vm_struct_small_href   |     30.694M|   30.432M|
|                       |       1.01x|         -|
|vm_struct_small_hset   |     28.590M|   28.083M|
|                       |       1.02x|         -|

Now with my PR:

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    147.665M|  115.510M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    146.666M|  116.298M|
|                       |       1.26x|         -|
|vm_struct_big_aset     |       7.098|     6.801|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.608M|   27.957M|
|                       |           -|     1.01x|
|vm_struct_big_href_lo  |     27.521M|   27.937M|
|                       |           -|     1.02x|
|vm_struct_big_hset     |       3.092|     3.113|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    147.494M|  113.729M|
|                       |       1.30x|         -|
|vm_struct_small_aset   |       6.898|     6.840|
|                       |       1.01x|         -|
|vm_struct_small_href   |     31.004M|   31.547M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.433M|   28.527M|
|                       |           -|     1.00x|

With changes suggested by @ko1 (Koichi Sasada):

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    146.836M|  115.141M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    147.929M|  115.918M|
|                       |       1.28x|         -|
|vm_struct_big_aset     |       7.079|     6.802|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.495M|   27.933M|
|                       |           -|     1.02x|
|vm_struct_big_href_lo  |     27.396M|   27.584M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.128|     3.095|
|                       |       1.01x|         -|
|vm_struct_small_aref   |    142.123M|  113.700M|
|                       |       1.25x|         -|
|vm_struct_small_aset   |       6.995|     6.755|
|                       |       1.04x|         -|
|vm_struct_small_href   |     30.561M|   31.424M|
|                       |           -|     1.03x|
|vm_struct_small_hset   |     28.353M|   28.650M|
|                       |           -|     1.01x|

It seems like 25-30% performance impact, but this is still a very fast operation. Is there any way we can make this faster? Could we have tracing and non-tracing op-codes which call different functions? e.g. vm_call_opt_struct_aref and vm_call_opt_struct_aref_trace for example.

Updated by ioquatix (Samuel Williams) over 1 year ago

With some minor changes the best performance I could get is around 20% cost:

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.051M|  121.849M|
|                       |       1.22x|         -|
|vm_struct_big_aref_lo  |    147.353M|  121.167M|
|                       |       1.22x|         -|
|vm_struct_big_aset     |       7.107|     6.469|
|                       |       1.10x|         -|
|vm_struct_big_href_hi  |     27.081M|   28.269M|
|                       |           -|     1.04x|
|vm_struct_big_href_lo  |     27.642M|   27.630M|
|                       |       1.00x|         -|
|vm_struct_big_hset     |       3.106|     3.150|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    139.761M|  113.250M|
|                       |       1.23x|         -|
|vm_struct_small_aset   |       7.086|     6.775|
|                       |       1.05x|         -|
|vm_struct_small_href   |     30.977M|   31.479M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.565M|   28.554M|
|                       |       1.00x|         -|
Actions #4

Updated by Eregon (Benoit Daloze) over 1 year ago

  • Related to Bug #16383: TracePoint does not report calls to attribute reader methods added

Updated by ioquatix (Samuel Williams) about 1 year ago

Rather than using branch for tracepoints, using a nop/jmp instruction which is set or cleared might be a lower impact option.

e.g. https://nullprogram.com/blog/2016/03/31/

Updated by mame (Yusuke Endoh) 4 days ago

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Updated by jeremyevans0 (Jeremy Evans) 3 days ago

mame (Yusuke Endoh) wrote in #note-7:

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Using the benchmark in the pull request, the pull request seems reliably ~5% faster in my testing for the member reader method, but mixed for other cases (sometimes up to 3% faster/slower depending on run). I don't have any explanation as to why it would be faster, so I'm going to assume that the difference is still within the margin of error.

I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0