Project

General

Profile

Feature #12142

Updated by hsbt (Hiroshi SHIBATA) almost 4 years ago

~~~  
   Hello, the following patch contains a new implementation of hash  
  tables (major files st.c and include/ruby/st.h).  

    Modern processors have several levels of cache.      Usually,the CPU  
  reads one or a few lines of the cache from memory (or another level of  
  cache).      So CPU is much faster at reading data stored close to each  
  other.      The current implementation of Ruby hash tables does not fit  
  well to modern processor cache organization, which requires better  
  data locality for faster program speed.  

  The new hash table implementation achieves a better data locality  
  mainly by  

    o switching to open addressing hash tables for access by keys.  
      Removing hash collision lists lets us avoid *pointer chasing*, a  
      common problem that produces bad data locality.      I see a tendency  
      to move from chaining hash tables to open addressing hash tables  
      due to their better fit to modern CPU memory organizations.  
      CPython recently made such switch  
      (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c).  
      PHP did this a bit earlier  
      https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html.  
      GCC has widely-used such hash tables  
      (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally  
      for more than 15 years.  

    o removing doubly linked lists and putting the elements into an array  
      for accessing to elements by their inclusion order.      That also  
      removes pointer chaising on the doubly linked lists used for  
      traversing elements by their inclusion order.  

  A more detailed description of the proposed implementation can be  
  found in the top comment of the file st.c.  

  The new implementation was benchmarked on 21 MRI hash table benchmarks  
  for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM  
  (Exynos 5410 - 1.6GHz Cortex-A15):  

  make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='<trunk ruby>'  

  Here the results for x86-64:  

  hash_aref_dsym           1.094  
  hash_aref_dsym_long              1.383  
  hash_aref_fix            1.048  
  hash_aref_flo            1.860  
  hash_aref_miss           1.107  
  hash_aref_str            1.107  
  hash_aref_sym            1.191  
  hash_aref_sym_long               1.113  
  hash_flatten             1.258  
  hash_ident_flo           1.627  
  hash_ident_num           1.045  
  hash_ident_obj           1.143  
  hash_ident_str           1.127  
  hash_ident_sym           1.152  
  hash_keys                2.714  
  hash_shift               2.209  
  hash_shift_u16           1.442  
  hash_shift_u24           1.413  
  hash_shift_u32           1.396  
  hash_to_proc             2.831  
  hash_values              2.701  

  The average performance improvement is more 50%.      ARM results are  
  analogous -- no any benchmark performance degradation and about the  
  same average improvement.  

  The patch can be seen as  

  https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch  

  or in a less convenient way as pull request changes  

  https://github.com/ruby/ruby/pull/1264/files  


  This is my first patch for MRI and may be my proposal and  
  implementation have pitfalls.      But I am keen to learn and work on  
  inclusion of this code into MRI.  

  ~~~ Los mejores brókers de forex ecn. https://es.forex-is.com

Back