Project

General

Profile

Actions

Bug #12274

closed

accessing to instance variable should be fast.

Added by tarui (Masaya Tarui) over 8 years ago. Updated over 8 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.0dev (2016-04-12 trunk 54553) [x86_64-linux]
[ruby-core:74899]

Description

Currently, accessing to instance variable is quite slower than accessing to local variable.
I think accessing to instance variable is basic operation and it should be fast, so tried to improve.

patch: https://github.com/tarui/ruby/commit/dd993da80c7ad84340689137bf8b308793595cae

On mame's optcarrot benchmark, (https://github.com/mame/optcarrot/)
it is 10%(!) faster than trunk.
It increases in the maintenance cost a little, but can I commit it?

$ ./ruby -v --disable-gems ../../optcarrot/bin/optcarrot --benchmark ../../optcarrot/examples/Lan_Master.nes
ruby 2.4.0dev (2016-04-12 trunk 54553) [x86_64-linux]
fps: 13.664029283085743
checksum: 59662

$ ./ruby -v --disable-gems ../../optcarrot/bin/optcarrot --benchmark ../../optcarrot/examples/Lan_Master.nes
ruby 2.4.0dev (2016-04-12 fast-ivar-access 54553) [x86_64-linux]
fps: 15.120651593726231
checksum: 59662

Updated by ko1 (Koichi Sasada) over 8 years ago

Tarui-san suggested another way to optimize and this is my version of that technique (with some refactoring).

evaluation result:
fps: 19.21335880758348
->
fps: 22.16285461090967
(15% improvement)

Index: vm_insnhelper.c
===================================================================
--- vm_insnhelper.c	(revision 54552)
+++ vm_insnhelper.c	(working copy)
@@ -778,45 +778,47 @@
 vm_getivar(VALUE obj, ID id, IC ic, struct rb_call_cache *cc, int is_attr)
 {
 #if USE_IC_FOR_IVAR
-    if (RB_TYPE_P(obj, T_OBJECT)) {
-	VALUE val = Qundef;
-	VALUE klass = RBASIC(obj)->klass;
+    VALUE klass = RBASIC(obj)->klass;
+    VALUE val;
+
+    if (LIKELY(is_attr ? cc->aux.index > 0 : ic->ic_serial == RCLASS_SERIAL(klass))) {
 	const long len = ROBJECT_NUMIV(obj);
 	const VALUE *const ptr = ROBJECT_IVPTR(obj);
+	long index = !is_attr ? (long)ic->ic_value.index : (long)(cc->aux.index - 1);
 
-	if (LIKELY(is_attr ? cc->aux.index > 0 : ic->ic_serial == RCLASS_SERIAL(klass))) {
-	    long index = !is_attr ? (long)ic->ic_value.index : (long)(cc->aux.index - 1);
-
-	    if (index < len) {
-		val = ptr[index];
-	    }
+	if (index < len && (val = ptr[index]) != Qundef) {
+	    return val;
 	}
-	else {
-	    st_data_t index;
-	    struct st_table *iv_index_tbl = ROBJECT_IV_INDEX_TBL(obj);
+	goto undefined;
+    }
+    else if (RB_TYPE_P(obj, T_OBJECT)) {
+	const long len = ROBJECT_NUMIV(obj);
+	const VALUE *const ptr = ROBJECT_IVPTR(obj);
+	st_data_t index;
+	struct st_table *iv_index_tbl = ROBJECT_IV_INDEX_TBL(obj);
+	val = Qundef;
 
-	    if (iv_index_tbl) {
-		if (st_lookup(iv_index_tbl, id, &index)) {
-		    if ((long)index < len) {
-			val = ptr[index];
-		    }
-		    if (!is_attr) {
-			ic->ic_value.index = index;
-			ic->ic_serial = RCLASS_SERIAL(klass);
-		    }
-		    else { /* call_info */
-			cc->aux.index = (int)index + 1;
-		    }
+	if (iv_index_tbl) {
+	    if (st_lookup(iv_index_tbl, id, &index)) {
+		if (!is_attr) {
+		    ic->ic_value.index = index;
+		    ic->ic_serial = RCLASS_SERIAL(klass);
 		}
+		else { /* call_info */
+		    cc->aux.index = (int)index + 1;
+		}
+
+		if ((long)index < len && (val = ptr[index]) != Qundef) {
+		    return val;
+		}
 	    }
 	}
 
-	if (UNLIKELY(val == Qundef)) {
-	    if (!is_attr && RTEST(ruby_verbose))
-		rb_warning("instance variable %"PRIsVALUE" not initialized", QUOTE_ID(id));
-	    val = Qnil;
+      undefined:
+	if (!is_attr && RTEST(ruby_verbose)) {
+	    rb_warning("instance variable %"PRIsVALUE" not initialized", QUOTE_ID(id));
 	}
-	return val;
+	return Qnil;
     }
 #endif	/* USE_IC_FOR_IVAR */
     if (is_attr)

Updated by Eregon (Benoit Daloze) over 8 years ago

Koichi Sasada wrote:

Tarui-san suggested another way to optimize and this is my version of that technique (with some refactoring).

The diff is hard to read, would you have a commit on GitHub or a patch file?

Tarui-san, could you explain a bit the technique?
I am not sure to understand, it seems vm_getinstancevariable already has some inline cache.
Is it some manual inlining in the instruction code + avoiding some ID2SYM/INT2FIX (but these two seem performed at compile time, so mostly irrelevant for the benchmark)?

Updated by tarui (Masaya Tarui) over 8 years ago

there are 2 parts of optimization.

  • share inline cache between same symbol(at compile.c)

  • inline fast pass only and cut useless check(RB_TYPE_P).(at insns.def)

We can skip st_lookup from the 2nd insns by sharing cache.

Inlining register pass may have a bit penalty.
Cutting check was a accidental :-), but it is not necessary if cached serial equals class one.

Updated by tarui (Masaya Tarui) over 8 years ago

2016-04-13 5:41 GMT+09:00 :

Issue #12274 has been updated by Benoit Daloze.
avoiding some ID2SYM/INT2FIX (but these two seem performed at compile time, so mostly irrelevant for the benchmark)?

It is not for avoiding ID2SYM (In fact, it is calculated every time
:-), it is for sharing.
Please check the 0007 below

$ ./ruby -v --disable-gems --dump=insns -e"@a=1;p @a"
ruby 2.4.0dev (2016-04-12 trunk 54553) [x86_64-linux]
== disasm: #<ISeq:<main>@-e>============================================
0000 trace            1                                               (   1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 setinstancevariable :@a, <is:0>
0006 putself
0007 getinstancevariable :@a, <is:1>
0010 opt_send_without_block <callinfo!mid:p, argc:1,
FCALL|ARGS_SIMPLE>, <callcache>
0013 leave
$ ./ruby -v --disable-gems --dump=insns -e"@a=1;p @a"
ruby 2.4.0dev (2016-04-12 fast-ivar-access 54553) [x86_64-linux]
== disasm: #<ISeq:<main>@-e>============================================
0000 trace            1                                               (   1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 setinstancevariable :@a, <is:0>
0006 putself
0007 getinstancevariable :@a, <is:0>
0010 opt_send_without_block <callinfo!mid:p, argc:1,
FCALL|ARGS_SIMPLE>, <callcache>
0013 leave

--
樽家昌也(Masaya TARUI)
No Tool,No Life.

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 8 years ago

Masaya Tarui wrote:

there are 2 parts of optimization.

  • share inline cache between same symbol(at compile.c)

  • inline fast pass only and cut useless check(RB_TYPE_P).(at insns.def)

We can skip st_lookup from the 2nd insns by sharing cache.

Inlining register pass may have a bit penalty.

Cutting check was a accidental :-), but it is not necessary if cached serial equals class one.

I see, thanks for explaining :)

About the object check, is it not problematic to do ((struct RBasic*)obj)->klass if obj is a tagged integer (since klass is the second member, after flags)?
Or is there a hidden check before doing that?

Updated by tarui (Masaya Tarui) over 8 years ago

About the object check, is it not problematic to do ((struct RBasic*)obj)->klass if obj is a tagged integer (since klass is the second member, after flags)?

Thank you for pointing out.
I'll revive check.

Actions #8

Updated by tarui (Masaya Tarui) over 8 years ago

  • Status changed from Open to Closed

Applied in changeset r54976.


  • compile.c (iseq_compile_each): share InlineCache during same
    instance variable accesses. Reducing memory consumption,
    rasing cache hit rate and rasing branch prediction hit rate
    are expected. A part of [Bug #12274].

    • iseq.h (struct iseq_compile_data): introduce instance
      variable IC table for sharing.

    • iseq.c (prepare_iseq_build, compile_data_free):
      construct/destruct above table.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0