Project

General

Profile

Actions

Bug #10511

closed

fix for #9940 causes dramatic performance regression

Added by os97673 (Oleg Sukhodolsky) over 9 years ago. Updated about 9 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 2.1.3p242 (2014-09-19 revision 47630) [x86_64-darwin14.0]
[ruby-core:66282]

Description

Originally reported against debase gem (https://github.com/denofevil/debase/issues/16)
After fix for #9940 running anything under debugger became about 4 times slower.
I've verified that we I revert the fix performance is restored.
To reproduce you could get https://github.com/railstutorial/sample_app_rails_4 and run rspec rake task under debugger (I've used RubyMine for this because debase is used as
debugger's backend there)
Performance with 2.1.2 it takes about 20 seconds on my laptop, with 2.1.3 - almost 80 seconds :(
The same problem is observable with 2.1.4 and 2.1.5 too.

Updated by ko1 (Koichi Sasada) over 9 years ago

Thank you for your report.

Could you make small benchmark program to debug it?

Updated by denofevil (Dennis Ushakov) over 9 years ago

I was not able to achieve 4x on a simple benchmark, but the difference is quite big:

require 'benchmark'

N = 10000
RECURSION = 1000

def recurse(i)
	recurse(i - 1) if i > 0
end

TracePoint.new(:call, :return, :c_call, :c_return, :line){|tp|
	tp.lineno
}.enable

puts Benchmark.measure { N.times { recurse(RECURSION) } }

Results are:

2.1.2 - 14.000000   0.070000  14.070000 ( 14.640882)
2.1.5 - 16.980000   0.030000  17.010000 ( 17.294063)

Results without tracepoint:

2.1.2 - 0.810000   0.000000   0.810000 (  0.890800)
2.1.5 - 0.830000   0.000000   0.830000 (  0.855288)

(almost no difference between versions)

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

I've tried the simple benchmark with 2.1.3 I've built myself and results are:
9.420000 0.010000 9.430000 ( 9.449823)
Than I've removed for #9940, rebuilt 2.1.3 and results are
8.200000 0.030000 8.230000 ( 8.265794)

Updated by ko1 (Koichi Sasada) over 9 years ago

ruby 2.2.0dev (2014-11-17 trunk 48465) [i386-mswin32_110]
 21.466000   0.000000  21.466000 ( 21.572857)

ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
 13.634000   0.000000  13.634000 ( 13.737744)

I try on my laptop. hmm...

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Assignee set to ko1 (Koichi Sasada)
  • Target version set to 2.2.0

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Applied in changeset r48596.


  • vm_core.h: add rb_thread_t::local_storage_recursive_hash
    to speed up Thread#[:recursive_key] access.
    [Bug #10511]
  • thread.c (threadptr_local_aref): add fast path for
    :recursive_data.
  • thread.c (threadptr_recursive_hash, threadptr_recursive_hash_set):
    add special accessor for recursive hash.
  • cont.c: store/restore local_storage_recursive_hash.
  • vm.c: init and mark local_storage_recursive_hash.
  • vm_trace.c (rb_threadptr_exec_event_hooks_orig): clear and restore
    local_storage_recursive_hash directly.

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Closed to Feedback

could you try r48596?

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Koichi Sasada wrote:

could you try r48596?

Here are results I've got.

$ ruby -v
ruby 2.2.0dev (2014-11-28 trunk 48629) [x86_64-darwin14]
$ ruby /tmp/bm.rb
6.660000 0.080000 6.740000 ( 7.153794)
$ ruby /tmp/bm.rb
6.710000 0.080000 6.790000 ( 7.119988)
$ rvm ruby-2.1.5 do ruby /tmp/bm.rb
10.230000 0.110000 10.340000 ( 10.906019)
$ rvm ruby-2.1.3 do ruby /tmp/bm.rb
10.130000 0.100000 10.230000 ( 10.585034)
$ rvm ruby-2.1.2 do ruby /tmp/bm.rb
8.210000 0.080000 8.290000 ( 8.507865)

Unfortunately I was unable to try the rails app I've used because sqlite can not be built with this version :(
Will ask others to try their tests.

Updated by ko1 (Koichi Sasada) over 9 years ago

Thank you for try. Can I close it?
Should we wait for your rails test?

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Koichi Sasada wrote:

Thank you for try. Can I close it?
Should we wait for your rails test?

I think the ticket can be closed. Rails app still shows some performance degradation, but it is unclear if it is ruby or ruby-debug-ide.
I will investigate this and (in case it is ruby) file a new ticket linked to this one.

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

I think I've found one more source of performance degradation:
in debase we keep contexts for threads in hash and on every event get them from there. And it looks like "hash[Thread.current]" became significantly slower in TracePoint's hendler's context.
Here is a simple test which shows the problem:

require 'benchmark'

N = 10000
RECURSION = 1000

def recurse(i)
    recurse(i - 1) if i > 0
end

theHash = Hash.new

TracePoint.new(:call, :return, :c_call, :c_return, :line){|tp|
    tp.lineno
    theHash[Thread.current]
}.enable

puts Benchmark.measure { N.times { recurse(RECURSION) } }

in my case it shows:

$ rvm ruby-2.1.2 do ruby /tmp/bm.rb
 44.080000   0.330000  44.410000 ( 45.207792)
$ rvm ruby-head do ruby /tmp/bm.rb
 72.170000   2.040000  74.210000 ( 76.259147)

Note: I've tried to replace "theHash[Thread.current]" with either "theHash[1]" or "Thread.current.hash" but such tests do not show any performance degradation, it looks like it is a combination of hash access and thread causes the problem.

Updated by normalperson (Eric Wong) over 9 years ago

wrote:

I think I've found one more source of performance degradation:

I cannot reproduce the regression you see on Linux systems.
In fact, I see improvements:

x86_64-linux (Xeon E3-1230 v3):

2.1.5: 44.440000 0.000000 44.440000 ( 44.482789)
r48706: 32.410000 0.000000 32.410000 ( 32.437961)

i686-linux (Xen VM):
2.1.5: 80.300000 0.030000 80.330000 ( 80.337652)
r48706: 58.050000 0.000000 58.050000 ( 58.822390)

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Eric Wong wrote:

wrote:

I think I've found one more source of performance degradation:

I cannot reproduce the regression you see on Linux systems.
In fact, I see improvements:

x86_64-linux (Xeon E3-1230 v3):

2.1.5: 44.440000 0.000000 44.440000 ( 44.482789)
r48706: 32.410000 0.000000 32.410000 ( 32.437961)

i686-linux (Xen VM):
2.1.5: 80.300000 0.030000 80.330000 ( 80.337652)
r48706: 58.050000 0.000000 58.050000 ( 58.822390)

you need to compare performance of r48706 and 2.1.2 since the problem was introduced in 2.1.3.
r48706 is definitely better than 2.1.5, but 2.1.2 is still much faster.

Updated by normalperson (Eric Wong) over 9 years ago

Oops, OK. I was suspicious of [Feature #9325] hurting performance
(for cases where a hash function needs tuning).

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Eric Wong wrote:

Oops, OK. I was suspicious of [Feature #9325] hurting performance
(for cases where a hash function needs tuning).

it looks like you mistyped the feature number :( #9325 is "Add make uninstall target" and I doubt it could cause any runtime problems ;)

Updated by normalperson (Eric Wong) over 9 years ago

Oops, [Feature #9425] (power-of-two hash sizes :)

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Eric Wong wrote:

Oops, [Feature #9425] (power-of-two hash sizes :)

If the problem is in implementation of hash function then I'd expect that simple Thread.current.hash instead of theHash[Thread.current] would show a performance degradation, but (as far as I can see) it is not.
If the cause of the problem is in Hash implementation than I'd expect that theHash[1] would show performance degradation too, but it is not.
IMHO it is something more subtle and it looks like it is somehow relates to #9940.

Updated by ko1 (Koichi Sasada) over 9 years ago

I found strange behaviour.

Using Hash.new, it is slow as you say:

Hash.new
ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
  5.944000   0.000000   5.944000 (  5.979824)
ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
  2.137000   0.000000   2.137000 (  2.152773)

but using Hash literal ({}), it is not slow:

{}
ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
  1.357000   0.000000   1.357000 (  1.352916)
ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
  1.607000   0.000000   1.607000 (  1.606704)

What difference?

Updated by ko1 (Koichi Sasada) over 9 years ago

I understood.

First hash (Hash.new) contains empty st table.
Second hash ({}) does not have st table (NULL).

So the performance is different.

Making an entry for second hash like that:

theHash = {}
theHash[Thread.current] = nil

also shows slow down:

ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
  6.084000   0.000000   6.084000 (  6.107212)

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Perhaps there is some VM optimization? As for the test: I've just mimic the code we have in debase native code, I doubt we can use {} there :(

Updated by ko1 (Koichi Sasada) over 9 years ago

Ok, I understand the reason.

For each hash access, new Hash object are created each time.
You can check this behavior by that:

p GC.stat[:total_allocated_objects]
puts Benchmark.measure { N.times { recurse(RECURSION) } }
p GC.stat[:total_allocated_objects]

result:

53968
  5.132000   0.000000   5.132000 (  5.128507)
6062275

This is because clear recursive hash every time. Let me think about it. Maybe recycling such recursive hash can solve.

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Koichi Sasada wrote:

Ok, I understand the reason.

For each hash access, new Hash object are created each time.

do you mean that it is your code who creates new hash every time?

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Feedback to Closed

Applied in changeset r48765.


  • vm_core.h: introduce new field
    rb_thread_t::local_storage_recursive_hash_for_trace to store
    recursive hash to avoid creating new recursive (nested) hashes
    for each trace events.
    [Bug #10511]
  • vm_trace.c (rb_threadptr_exec_event_hooks_orig): use it.
  • cont.c: catch up this fix.
  • vm.c (rb_thread_mark): ditto.

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Closed to Feedback

could you try r48765?

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Koichi Sasada wrote:

could you try r48765?

I've tried and it works ok for me (both simple benchmark and rails app + RubyMine debugger)
Asked our users to test too. Will let you know if they will find any problems.
BTW thank you for your work.

Updated by os97673 (Oleg Sukhodolsky) over 9 years ago

Oleg Sukhodolsky wrote:

Koichi Sasada wrote:

could you try r48765?

I've tried and it works ok for me (both simple benchmark and rails app + RubyMine debugger)
Asked our users to test too. Will let you know if they will find any problems.

users have confirmed that the problem is fixed.

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Feedback to Closed

Thank you for confirmation!

Updated by nagachika (Tomoyuki Chikanaga) about 9 years ago

  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1: REQUIRED

see #10597

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0