Bug #18886: Struct aref and aset don't trigger any tracepoints. - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18886

closed

Struct aref and aset don't trigger any tracepoints.

Added by ioquatix (Samuel Williams) about 3 years ago. Updated over 1 year ago.

Status:

Closed

Assignee:

ko1 (Koichi Sasada)

Target version:

ruby -v:

Backport:

2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN

[ruby-core:109083]

Description

Given the following program, thing.name and thing.shape don't trigger c_call trace points (or any trace points actually).

pp RUBY_VERSION

trace_point = TracePoint.new(:line, :call, :c_call, :a_call) do |trace|
  puts trace.event

  if trace.event == :call
    # Ruby doesn't always mark call-sites in sub-expressions, so we use this approach to compute a call site and mark it:
    if location = caller_locations(2, 1).first and path = location.path
      puts "> #{path}:#{location.lineno}:#{trace.event}"
    end
  end
  
  if path = trace.path
    puts "= #{path}:#{trace.lineno}:#{trace.event}"
  end
end

trace_point.enable

# This will trigger call trace points
class Thing
  def name
    :cat
  end
  
  def shape
    :square
  end
end
thing = Thing.new

# Thing = Struct.new(:name, :shape)
# thing = Thing.new(:cat, :rectangle)

[
  name: thing.name,
  shape: thing.shape,
]

Current HEAD¶

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:

Proposed PR¶

= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
= ../test.rb:34:c_call:name
= ../test.rb:35:c_call:shape

The reason is the internal implementation of struct doesn't have trace point instrumentation in vm_call_opt_struct_aset or vm_call_opt_struct_aref.

Proposed fix: https://github.com/ruby/ruby/pull/6071 but this would need a review, maybe @jeremyevans0 (Jeremy Evans) and @ko1 (Koichi Sasada) can help.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by ioquatix (Samuel Williams) about 3 years ago

Description updated (diff)

Actions

Copy link

#2 [ruby-core:109084]

Updated by ioquatix (Samuel Williams) about 3 years ago

Here is my performance comparison:

Firstly, with no changes (should be identical, shows some variance).

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.515M|  147.470M|
|                       |       1.01x|         -|
|vm_struct_big_aref_lo  |    148.916M|  147.628M|
|                       |       1.01x|         -|
|vm_struct_big_aset     |       7.006|     7.080|
|                       |           -|     1.01x|
|vm_struct_big_href_hi  |     27.652M|   27.382M|
|                       |       1.01x|         -|
|vm_struct_big_href_lo  |     27.547M|   27.725M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.049|     3.094|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    144.031M|  136.167M|
|                       |       1.06x|         -|
|vm_struct_small_aset   |       7.032|     7.063|
|                       |           -|     1.00x|
|vm_struct_small_href   |     30.694M|   30.432M|
|                       |       1.01x|         -|
|vm_struct_small_hset   |     28.590M|   28.083M|
|                       |       1.02x|         -|

Now with my PR:

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    147.665M|  115.510M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    146.666M|  116.298M|
|                       |       1.26x|         -|
|vm_struct_big_aset     |       7.098|     6.801|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.608M|   27.957M|
|                       |           -|     1.01x|
|vm_struct_big_href_lo  |     27.521M|   27.937M|
|                       |           -|     1.02x|
|vm_struct_big_hset     |       3.092|     3.113|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    147.494M|  113.729M|
|                       |       1.30x|         -|
|vm_struct_small_aset   |       6.898|     6.840|
|                       |       1.01x|         -|
|vm_struct_small_href   |     31.004M|   31.547M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.433M|   28.527M|
|                       |           -|     1.00x|

With changes suggested by @ko1 (Koichi Sasada):

> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort) 
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    146.836M|  115.141M|
|                       |       1.28x|         -|
|vm_struct_big_aref_lo  |    147.929M|  115.918M|
|                       |       1.28x|         -|
|vm_struct_big_aset     |       7.079|     6.802|
|                       |       1.04x|         -|
|vm_struct_big_href_hi  |     27.495M|   27.933M|
|                       |           -|     1.02x|
|vm_struct_big_href_lo  |     27.396M|   27.584M|
|                       |           -|     1.01x|
|vm_struct_big_hset     |       3.128|     3.095|
|                       |       1.01x|         -|
|vm_struct_small_aref   |    142.123M|  113.700M|
|                       |       1.25x|         -|
|vm_struct_small_aset   |       6.995|     6.755|
|                       |       1.04x|         -|
|vm_struct_small_href   |     30.561M|   31.424M|
|                       |           -|     1.03x|
|vm_struct_small_hset   |     28.353M|   28.650M|
|                       |           -|     1.01x|

It seems like 25-30% performance impact, but this is still a very fast operation. Is there any way we can make this faster? Could we have tracing and non-tracing op-codes which call different functions? e.g. vm_call_opt_struct_aref and vm_call_opt_struct_aref_trace for example.

Actions

Copy link

#3 [ruby-core:109088]

Updated by ioquatix (Samuel Williams) about 3 years ago

With some minor changes the best performance I could get is around 20% cost:

|                       |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi  |    149.051M|  121.849M|
|                       |       1.22x|         -|
|vm_struct_big_aref_lo  |    147.353M|  121.167M|
|                       |       1.22x|         -|
|vm_struct_big_aset     |       7.107|     6.469|
|                       |       1.10x|         -|
|vm_struct_big_href_hi  |     27.081M|   28.269M|
|                       |           -|     1.04x|
|vm_struct_big_href_lo  |     27.642M|   27.630M|
|                       |       1.00x|         -|
|vm_struct_big_hset     |       3.106|     3.150|
|                       |           -|     1.01x|
|vm_struct_small_aref   |    139.761M|  113.250M|
|                       |       1.23x|         -|
|vm_struct_small_aset   |       7.086|     6.775|
|                       |       1.05x|         -|
|vm_struct_small_href   |     30.977M|   31.479M|
|                       |           -|     1.02x|
|vm_struct_small_hset   |     28.565M|   28.554M|
|                       |       1.00x|         -|

Actions

Copy link

Updated by Eregon (Benoit Daloze) about 3 years ago

Related to Bug #16383: TracePoint does not report calls to attribute reader methods added

Actions

Copy link

#5 [ruby-core:109983]

Updated by ioquatix (Samuel Williams) almost 3 years ago

Rather than using branch for tracepoints, using a nop/jmp instruction which is set or cleared might be a lower impact option.

e.g. https://nullprogram.com/blog/2016/03/31/

Actions

Copy link

#6 [ruby-core:115184]

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/8782

Actions

Copy link

#7 [ruby-core:115494]

Updated by mame (Yusuke Endoh) over 1 year ago

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Actions

Copy link

#8 [ruby-core:115519]

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

mame (Yusuke Endoh) wrote in #note-7:

@jeremyevans0 (Jeremy Evans) Could you measure the overhead?

Using the benchmark in the pull request, the pull request seems reliably ~5% faster in my testing for the member reader method, but mixed for other cases (sometimes up to 3% faster/slower depending on run). I don't have any explanation as to why it would be faster, so I'm going to assume that the difference is still within the margin of error.

I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.

Actions

Copy link

#9 [ruby-core:115625]

Updated by ko1 (Koichi Sasada) over 1 year ago

Please merge it.

I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.

Showing benchmarking result will help IMO to confirm there is no penalty.

Actions

Copy link

#10

Updated by jeremyevans (Jeremy Evans) over 1 year ago

Status changed from Open to Closed

Applied in changeset git|3081c83169c55ef7eead6222e49248e09232c22c.

Support tracing of struct member accessor methods

This follows the same approach used for attr_reader/attr_writer in
2d98593bf54a37397c6e4886ccc7e3654c2eaf85, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.

Fixes [Bug #18886]

Actions

Copy link

Also available in: Atom PDF

Like1

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #18886

Struct aref and aset don't trigger any tracepoints.

Current HEAD¶

Proposed PR¶

Updated by ioquatix (Samuel Williams) about 3 years ago

Updated by ioquatix (Samuel Williams) about 3 years ago

Updated by ioquatix (Samuel Williams) about 3 years ago

Updated by Eregon (Benoit Daloze) about 3 years ago

Updated by ioquatix (Samuel Williams) almost 3 years ago

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

Updated by mame (Yusuke Endoh) over 1 year ago

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

Updated by ko1 (Koichi Sasada) over 1 year ago

Updated by jeremyevans (Jeremy Evans) over 1 year ago