Project

General

Profile

Feature #17884

Updated by ko1 (Koichi Sasada) almost 3 years ago

(MRI internals) 

 Profiling tools need to record the code location, mainly a pair of file path and line number ("file:line"). 
 To record this pair in 64bit CPU, 8B (VALUE) + 4B (int) = 12B is needed. In general, the number of pairs (file:line) in a interpreter process does not exceed 32bit boundary (4G pairs). `st_data_t` is 8B (or 4B on 32bit CPU) and we can not store "file:path" information as `st_table` key/value. 

 Also getting a line from PC (program counter), is not simple (now we are using succinct bitvector, enough fast and compact data in general, but need some calculations). 

 To solve the size and the time problem, we introduced new concept "locindex". 

 "locindex" is `unsigned int` data structure, maybe 4B in many environments. A "locindex" represents a pair of "iseq" and "PC" (more correctly, "pc_index", given by `PC - iseq->body->iseq_encoded`). 

 We can get "locindexL" from "iseqA" with "pcB". "iseqA" will not be freed in this process. 
 From "locindexL", we can get "iseqA" and "pcB". We can calculate "file:line" information from the iseq/pc pair. "file:line" information is needed to see the profiling results, so the performance of getting "file:line" from a "iseq/pc" pair is not important, in many cases. 

 "locindex" is calculated by the following pseudo code: 

 ```ruby 
 $last_locindex = 1 
 $global_recorded_ary = [] 

 def locindex iseq, pc # pc is pc_index (nth instruction) 
   if iseq->locindex_start == 0 # not recorded yet 
     iseq->locindex_start = last_locindex 
     $last_locindex += iseq->iseq_size 
     $global_recorded_ary.push(iseq) 
   end 
   iseq->locindex_start + pc 
 end 

 def resolve locindex 
   $global_recorded_ary.each{|iseq| 
     if locindex is in iseq? 
       return [iseq, locindx - iseq->locindex_start] 
     end 
   } 
 end 
 ``` 

 ---- 

 `ObjectSpace.trace_object_allocations` is one of profiling tool and we can use "locindex" to make it. 
 I implemented and measure the performance. 

 The benchmark program: 

 ```ruby 
 # This file should be located to the ruby's src directory 

 require 'objspace/trace' 
 require 'rdoc/rdoc' 
 require 'tmpdir' 

 srcdir = File.expand_path(__dir__) 
 STDERR.puts srcdir 

 Dir.mktmpdir('rdocbench-'){|d| 
   dir = File.join(d, 'rdocbench') 
   args = %W(--root #{srcdir} --page-dir #{srcdir}/doc --encoding=UTF-8 --no-force-update --all --ri --debug --quiet #{srcdir}) 
   args << '--op' << dir 

   r = RDoc::RDoc.new 
   r.document args 
 } 
 ``` 

 Results: 

 ``` 
 # without 'objspace/trace' 
 real      0m19.764s 
 user      0m19.200s 
 sys       0m0.561s 

 # with 'objspace/trace' 
 real      0m42.638s 
 user      0m41.695s 
 sys       0m0.920s 

 # with 'objspace/trace' and locindex 
 real      0m36.875s 
 user      0m35.956s 
 sys       0m0.890s 

 # with 'objspace/trace' light mode 
 real      0m27.743s 0m25.850s 
 user      0m26.921s 0m25.085s 
 sys       0m0.820s 0m0.762s 
 ``` 

 Light mode is only recording "locindex". 
 I believe that most of case it is enough to see the "file:line" pair for performance tuning. 

 Implementation: https://github.com/ruby/ruby/pull/4524/ 
 "light-mode" seems more practical. 

Back