Project

General

Profile

Actions

Feature #14989

closed

Add Hash support for transient heap

Added by tacinight (Yimin Zhao) over 6 years ago. Updated about 6 years ago.

Status:
Closed
Target version:
-
[ruby-core:88470]

Description

Hi, there
I am a GSoC student following @ko1 (Koichi Sasada) to do some work for Ruby, here I want to make a brief introduction for one of my work in this period.

Transient Heap

The first discussion of transient heap was opened by @ko1 (Koichi Sasada) at Introduce 2nd GC heap named Transient heap. In his prototype, he completed the implement for arrays, and the future work should also support the type String, Hash as well as Extend(shrink) policy for transient heap.

My work here is adding support for Hash so that transient heap can allocate space for Hash. I encounted some difficults that had to be handled.

  1. Hash type objects use st_table to store elements, the correlation between those two is weak. For transient heap, it has to know the ruby type of the objects.
  2. Transient heap is based on the hypothesis that most of objects die at a young age. To understand the issue with first need to survey the distribution of hash size.

The distribution of hash size

Implement

The capacity of st_table is always a power of 2, we recorded the exponent into a global array before gc or functions explicitly frees st_table. Look at the patch for more details.

Results

We measured 6 applications in order to find a real distribution of the hash size. Respectively, those applications are:

  1. make check command under ruby source code.
  2. make gcbench_rdoc command under ruby source code.
  3. gem uninstall --all to uninstall over 300 gems
  4. bundle install commmand under project discourse
  5. discourse manually explore the web pages and simulate the different events for about 20 minutes (development environment)
  6. rails_ruby_bench project rails_ruby_bench with parameters : 8 workers, 1500 iterations
  7. rails_ruby_bench 2 8 workers, 12000 iterations
  8. rails_ruby_bench 3 20 workders, 20000 iterations

The results show that objects under size 8(area of exponent 2 to 3) account for more than 80% of the total data in most situations. The raw data can be found at st_table size statistics.

Add Hash support

According to the previous survey result, we use a array-like fixed-size table to store the elements. We name it as LinearTable.
In this way, we have two benefits:

  1. We don't have to create st_table for small hash(but it will when necessary)
  2. the code is written into hash.c (while st_table code is in st.c). its easy to bind small hash with transient heap, and by default, when using st_table, it uses the malloced space.

So the basic data structure look like this.

#define LINEAR_TABLE_MAX_SIZE 8

typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;

struct RHash {
    struct RBasic basic;
    union {
	struct st_table *ntbl;      /* possibly 0 */
	struct LinearTable *ltbl;
    } as;
    int iter_lev;
    const VALUE ifnone;
};

The layout of li_table_entry

Considering the size of 8 hashes/keys/records is just the size of a cache line, I also tried different data layouts.

#define LINEAR_TABLE_MAX_SIZE 8

// the original layout
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;
#define LINEAR_TABLE_MAX_SIZE 8

// the original layout
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 1
typedef struct li_table_entry {
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    VALUE hashes[LINEAR_TABLE_MAX_SIZE];
    li_table_entry pairs[LINEAR_TABLE_MAX_SIZE];
} li_table;
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 2
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
    VALUE records[LINEAR_TABLE_MAX_SIZE];
} li_table;
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 3
typedef struct LinearTable {
    const struct st_hash_type *type;
    VALUE hashes[LINEAR_TABLE_MAX_SIZE];
    VALUE keys[LINEAR_TABLE_MAX_SIZE];
    VALUE records[LINEAR_TABLE_MAX_SIZE];
} li_table;

The microbench results show the original edition and variant 1 have better performance than the other two. The patches can be found in the repository as linear_table_v1.patch, linear_table_v2.patch, linear_table_v3.patch, linear_table_v4.patch;

Integrate data into flag bits

Considering the maximum size of entries is 8. We can store the num_entries and num_bound into RBasic(hash)->flag;
Related patch is here which based on linear_table_v1.patch

Benchmark results

I execuated comprehensive benchmark test based on official benchmark tool - benchmark-driver.

  1. linear_table_benchmark, includes the make benchmark results of trunk vs. v1(the original version) vs. v2(the variant 1).
  2. transient hash benchmark, includes the make benchmark results of trunk vs. transient_heap(based on v1) vs. integrated flag
  3. linear_table_variant_benchmark, includes the micro-benchmark results of trunk vs. v1 vs. v2 vs. v3 vs. v4. The used benchrmark scripts are located at microbench directory.
  4. linear_table real app bench, includes the test results of the six applications noted above.
  5. transient heap microbench, include the micro-benchrmark resutls of trunk vs. v1 vs. transient-hash vs integrated-flag. This time I also add memory usage comparison.
  6. linear_table_vary_size_microbench, we once worried the size of linear table will impact on the benchmark results because of the linear search. So we test the linear table with different size and the results show the table size make no relevant difference.

Future work

  1. In transient hash benchmark, we saw a preformance degradation compare to the preformance improvement in linear_table_benchmark in same items. It needs to be analyzed and to be solved.

My GitHub Repository

I will update my status at a GitHub repository: https://github.com/Tacinight/ruby-gsoc-2018.
More explanatory images and other of my work also can be found at this repository.
Welcome to review the patches and leave your valuable comments.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0