Feature #12628


change block/env structs

Added by ko1 (Koichi Sasada) almost 6 years ago. Updated almost 5 years ago.

Target version:


I will change block/env structures for performance.

I'm not sure who interests about this area. But it will be big change.


Now, MRI has several problems.

(1) we need to clear rb_control_frame_t::block_iseq for every frame setup. It consumes space (a VALUE for each frame) and initializing time.
(2) There are several block passing ways by ISeq (iter{...}), Proc(iter(&pr)), Symbol(iter(:sym)). However, they are not optimized (for Symbol blocks, there is only ad-hoc check code).
(3) Env (and Proc, Binding) objects are not WB-protected ([Bug #10212]).


To solve them, I wrote a big patch.

Introduce Block Handler (BH)

For Issues (1) and (2), I introduced a concept "Block Handler" (BH).

Current implementation

Now, rb_block_t pointers are passed to represent given blocks.

rb_block_t has the following types.
(1) A part of current control frame (with block_iseq = iseq) (iter{...})
(2) proc body (iter(&pr))
(3) A part of current control frame (with block_iseq = :sym) (iter(&:sym))
(for internal, there are (4) ifunc, for C implemented block)

They are placed on the frame of passed method (as a local variable (ep[0])).

To mark Proc on GC for (2), we prepare rb_block_t::proc (== rb_control_frame_t::block_iseq).

Using BH

To remove rb_block_t::proc (== rb_control_frame_t::block_iseq),
we introduce BH to put Proc or Symbol directly as given block (they are located as a special local variable).

Proc and Symbol are normal objects so that we can put them without any concern.
We need to think about iseq and ifunc type ((1) and (4)).

To make it clear, I introduced struct rb_captured_block to represent a set of self, local variables (ep) and iseq (or ifunc). (now rb_block_t represents same set)

Passed blocks with iseq (iter{...}) are represented with a pointer of rb_captured_block.
Such pointers are not managed VALUE, so that we add a tag for such pointers.

  • ptr | 0x01 -> pointer to captured_block contains iseq
  • ptr | 0x03 -> pointer to captured_block contains ifunc (for internal)

Tagged pointers are recognized as Fixnum by GC.

(Note that current implementation uses this tagged pointer to represent "local frame" (no previous Env) flag.
Instead of tagged information, we introduce VM_ENV_FLAG_LOCAL as a frame flag for this purpose.
See next chapter about "ENV_FLAG"s)

We can recognize a type of passed BH with the following rule:

(0) BH == VM_BLOCK_HANDLER_NONE (== 0) -> no block given
(1) (BH & 0x03) == 0x01 -> pointer to captured_block contains iseq
(2) (BH & 0x03) == 0x02 -> pointer to captured_block contains ifunc
(3) SYMBOL_P(BH) -> Symbol
(4) Otherwize -> Proc

This is what vm_block_handler_type(VALUE block_handler) does.

To invoke passed block represented by BH, we need to check the type of each BH with vm_block_handler_type(VALUE block_handler). There are several extra overhead because current implementation only need to check rb_block_t::iseq (this can contains iseq, ifunc and Symbol). However I believe it is more simple and readable.
In fact, "invoke block" benchmark (vm1_yield) is faster.

I renamed rb_block_t to struct rb_block to represent a escaped block which is stored by Proc or Binding.
We introduce rb_block::type to represent a type corresponding BH's type.
rb_block::as is a union type to represent a block body specified by type.
We can convert rb_block <-> BH each others.

struct rb_block {
    union {
	struct rb_captured_block captured;
	VALUE symbol;
	VALUE proc;
    } as;
    enum rb_block_type type;

To check the type of block, we should use vm_block_type() instead of check rb_block_t::type directly because there are several assertions (when VM_CHECK_MODE > 0).

Short summary

(1) Introduce struct rb_captured_block to represent a set of self, variables (ep), and code (iseq or ifunc).
Usually the space of this type are the caller's control frame.
(2) For methods called with block, they receive "Block Handler" (BH) represents a passed block. It should be a tagged struct rb_captured_block (seems as Fixnum), Proc object or Symbol object.
(3) Caller method with block (== iterator) invokes block by checking given BH type. We can check BH type with vm_block_handler_type().
(4) To make Proc, convert BH to struct rb_block.

Introduce WB for Env objects

WB is important for generational and incremental GC (for issues (3)). We can run MRI without WB for all objects because of RGenGC "wb-unprotected" technique. In fact, we don't introduce WBs for RubyVM::Env (Env) objects because it has performance impact to introduce WB for this objects. This means that all of assignments to local variables should check WB needed or not.

However, there are several performance regression. For example, if an application creates many Proc objects, corresponding Env objects are created and they should be marked each minor GC (because they are wb-unprotected). This is what the ticket [Bug #10212] shows.

So we need to achieve "low latency WB (for Env objects)".

Current MRI's local variable assignment:

    /* actual assignment in insns.def, setlocal instruction */
    *(ep - idx) = val;

Naive implementation with WB will be:

#define VM_EP_IN_HEAP_P(th, ep)   (!((th)->stack <= (ep) && (ep) < ((th)->stack + (th)->stack_size)))

   if (VM_EP_IN_HEAP_P(ep)) {
     RB_OBJ_WRITE(VM_ENV_EP_ENVVAL(ep), ep-idx, val);
   else {
     *(ep - idx) = val;

It is correct, but not so fast code (in fact, it is too slow when Env is in heap (== escaped)).


At first we need to check the local variables are located on the (1) VM stack or (2) Env. We don't need to protect with WB for (1) because VM stacks are root for every GC.

To make it simple, we move rb_control_frame_t::flags to ep[0] (as a special local variable) and introduce VM_ENV_FLAG_ESCAPED. We can easily check "on stack" (flags & VM_ENV_FLAG_ESCAPED == 0) or "escaped" (== on Env) (flags & VM_ENV_FLAG_ESCAPED != 0). We don't need to compare with VM stack range.

To locate flags onto ep (local variables), I cleanup managed data area on local variables.

#define VM_ENV_DATA_SIZE             ( 3)

#define VM_ENV_DATA_INDEX_ME_CREF    (-2) /* ep[-2] */
#define VM_ENV_DATA_INDEX_SPECVAL    (-1) /* ep[-1] */
#define VM_ENV_DATA_INDEX_FLAGS      ( 0) /* ep[ 0] */
#define VM_ENV_DATA_INDEX_ENV        ( 1) /* ep[ 1] */
#define VM_ENV_DATA_INDEX_ENV_PROC   ( 2) /* ep[ 2] */

It means that 3 (== VM_ENV_DATA_SIZE) special local variables are allocated for each frame (index -2 to 0).
(Note that index 1 and 2 is only used by escaped Env)
Current MRI already has 2 special local variables (me_cref and special).
I introduced macro name to avoid magic numbers.

To respect this local variable layout, compile.c requires several fixes and rb_iseq_t::local_size is no longer needed (we can calculate local variable number with local_table_size with VM_ENV_DATA_SIZE.

Another optimization is introducing VM_ENV_FLAG_WB_REQUIRED flag.
It is very tricky and danger method so we should not use this hack in other places.
This flag is tightly connected to the current GC implementation.

We need WB protection for "non remembered old objects (or gray objects on incremental GC)". When the old objects are remembered, we don't need WB protection any more until next marking. So VM_ENV_FLAG_WB_REQUIRED shows this status.

(1) At initializing Env objects, VM_ENV_FLAG_WB_REQUIRED is true.
(2) At first local variable assignment, VM_ENV_FLAG_WB_REQUIRED is true, so we insert WB protection for this Env object. And turn off this flag.
(3) At next local variable assignment, VM_ENV_FLAG_WB_REQUIRED is false, so we can ignore WB protection.
(4) At GC marking for this Env object, we turn off VM_ENV_FLAG_WB_REQUIRED and goto (2).

The time (2) and (4) could be enough long so only a few WB protection is needed.

At last, local variables assignment code is like the following.

NOINLINE(static void vm_env_write_slowpath(const VALUE *ep, int index, VALUE v));

static void
vm_env_write_slowpath(const VALUE *ep, int index, VALUE v)
    /* remember env value forcely */
    VM_FORCE_WRITE(&ep[index], v);

static inline void
vm_env_write(const VALUE *ep, int index, VALUE v)
    if (LIKELY((flags & VM_ENV_FLAG_WB_REQUIRED) == 0)) {
	VM_STACK_ENV_WRITE(ep, index, v); /* write lvar directly */
    else {
	vm_env_write_slowpath(ep, index, v);

With these techniques, now RubyVM::Env objects are WB-protected without big performance impact.
Now, Proc, Binding objects are also WB-protected.

Short summary

To make Env object wb-protected, I implemented a low-overhead WB technique.

(1) Move frame flags form rb_control_frame_t::flags to ep[0] (as a special local variable) and introduce VM_ENV_FLAG_ESCAPED to represent escaped Env.
(2) Introduce VM_ENV_FLAG_WB_REQUIRED to check necessity of WB protection which is tightly coupled with GC implementation.
(3) With this technique and other hacks, now RubyVM::Env, Proc and Binding objects are WB-protected.


Introducing WBs for Env/Proc objects, we can improve the throughput of app_lc_fizzbuzz benchmark.
Also method and block invocations are faster.

several results:

                    trunk  modified
 app_lc_fizzbuzz   58.277    41.729 (sec) (x 1.397 faster)
 vm1_simplereturn*  0.660     0.638 (sec) (x 1.035 faster)
 vm1_yield*         0.738     0.650 (sec) (x 1.135 faster)

There are several slower programs.

                    trunk  modified
 app_pentomino     14.096    15.241 (sec) (x 0.925 faster == slow)
 vm1_lvar_set*      1.893     1.916 (sec) (x 0.988 faster == slow)

lvar_set tries to set local variables many times but not so big impact.
I'm not sure why pentomino puzzle is too slow.

All of benchmarks are here:


I made a patch to solve issues (1) to (3).

A patch is slightly big but it is difficult to separate into small part of code for me,
so I'll commit it soon at once, sorry.

Related issues 4 (0 open4 closed)

Related to Ruby master - Bug #12927: SIGSEGV during GC marking of sym procsClosedActions
Related to Ruby master - Bug #13090: Cannot use return statement in lambdas using instance_eval (MRI 2.4)ClosedActions
Related to Ruby master - Bug #13775: Ruby hangs when calling scope and belongs_to many times (with mongomapper)ClosedActions
Related to Ruby master - Bug #13772: Memory leak recycling stacks for threads in 2.4.1Closedko1 (Koichi Sasada)Actions

Also available in: Atom PDF