Bug #11278
closedremove rb_control_frame_t::klass
Description
Abstract¶
rb_control_frame_t has a field klass
, which is used to search super class when super
is called (and also several usages). super
is only for methods. However, all of rb_control_frame_t requires to keep klass
on other frames such as block and so on.
This patch solve this issue by introducing rb_callable_method_entry_t
.
https://github.com/ko1/ruby/tree/remove_cf_klass
rb_callable_method_entry_t
is similar to rb_method_entry_t
(actually, same data layout), but it has defined_class
.
Background¶
For methods defined to classes, then owner of these methods are also defined_class.
class C1 < C0
def foo # foo's owner is C1, and foo's defined class is C0.
super
end
end
We can start to search super class from C1
's super class (C0
).
However, when we define methods in a modules, then defined class is not fixed.
module M
def foo # foo's owner is M, however, defined class is not fixed.
super
end
end
We can not search super class from module M
.
M
is used when some classes include (extend, prepend). These classes determine super classes.
class C1 < C0
include M
end
In this case, we can know super class of M#foo
(included by C1) is C0
.
To represent a correct class hierarchy, MRI uses special class T_ICLASS.
T_ICLASS is internal class points including (extending and prepending) modules like that:
C1 -> T_ICLASS -> C0
|
+-> M
# Let's use notation I(M) to represent this data structure.
# C1 -> I(M) -> C0
We can't determine defined class of M#foo
, but we can determine a defined class I(M)#foo
(in this case, it is C0
).
Current MRI pushes defined class of methods onto control frame stack (rb_control_frame_t::klass
).
However, it becomes overhead, especially for non-method frames such as blocks and so on.
To overcome this issue, I introduced rb_callable_method_entry_t
,
which is similar to rb_method_entry_t
, but has defined_class
.
(rb_callable_method_entry_t
is T_IMEMO/imemo_ment, same as rb_method_entry_t
)
For C1#foo
, the defined class is just C1
. So rb_method_entry_t
of C1#foo
is also rb_callable_method_entry_t
.
For M#foo
, the defined class is not fixed. So rb_method_entry_t
of M#foo
is not a rb_callable_method_entry_t
.
rb_callable_method_entry_t
is created when M#foo
is called by I(M)
.
We can find I(M)
when we search M#foo in a class hierarchy C1 -> I(M) -> C0
.
Let's call created rb_callable_method_entry_t
for M#foo
with I(M)
as I(M)#foo
.
It is inefficient that we make I(M)#foo
everytime when M#foo
is called.
So I(M)#foo
is cached in a table pointed by I(M)
.
This table will be cleared when M
is redefined.
pros. and cons.¶
Advantage:
- Faster pushing control frame especially for block invocation.
- Simplify codes around searching super classes.
Disadvantage:
- Increase memory consumption because of two reasons
- Duplicate method entries for methods defined by modules.
- Cache table kept by
I(M)
- Increase complexity maintaining method entries.
rb_method_entry_t
was a simple enough data structure. We need to consider which data structures are required.
Measurement¶
For performance.¶
I do benchmark repeating 10 times (pickup the fastest results).
Speedup ratio: compare with the result of `trunk' (greater is better)
name modified
app_answer 1.032
app_aobench 0.989
app_erb 1.006
app_factorial 1.000
app_fib 1.026
app_lc_fizzbuzz 1.144
app_mandelbrot 1.032
app_pentomino 0.996
app_raise 0.996
app_strconcat 0.981
app_tak 0.999
app_tarai 1.004
app_uri 1.001
array_shift 0.913
hash_aref_flo 1.023
hash_aref_miss 1.097
hash_aref_str 1.074
hash_aref_sym 1.051
hash_aref_sym_long 1.047
hash_flatten 1.002
hash_ident_flo 1.020
hash_ident_num 1.038
hash_ident_obj 1.036
hash_ident_str 1.055
hash_ident_sym 1.016
hash_keys 0.993
hash_shift 1.046
hash_values 1.006
io_file_create 0.983
io_file_read 0.985
io_file_write 1.014
io_select 0.958
io_select2 0.972
io_select3 1.027
loop_for 1.067
loop_generator 0.980
loop_times 1.078
loop_whileloop 0.995
loop_whileloop2 1.005
marshal_dump_flo 1.014
marshal_dump_load_geniv 0.989
marshal_dump_load_time 0.988
securerandom 0.944
so_ackermann 1.018
so_array 1.049
so_binary_trees 0.993
so_concatenate 1.036
so_count_words 1.012
so_exception 0.989
so_fannkuch 1.017
so_fasta 1.003
so_k_nucleotide 1.005
so_lists 1.001
so_mandelbrot 0.998
so_matrix 0.987
so_meteor_contest 1.035
so_nbody 0.997
so_nested_loop 1.054
so_nsieve 1.010
so_nsieve_bits 1.022
so_object 0.992
so_partial_sums 1.018
so_pidigits 0.993
so_random 0.981
so_reverse_complement 0.986
so_sieve 1.007
so_spectralnorm 1.014
vm1_attr_ivar* 0.991
vm1_attr_ivar_set* 0.987
vm1_block* 1.009
vm1_const* 0.983
vm1_ensure* 0.960
vm1_float_simple* 0.954
vm1_gc_short_lived* 1.002
vm1_gc_short_with_complex_long* 1.004
vm1_gc_short_with_long* 0.996
vm1_gc_short_with_symbol* 0.998
vm1_gc_wb_ary* 1.004
vm1_gc_wb_ary_promoted* 1.141
vm1_gc_wb_obj* 0.998
vm1_gc_wb_obj_promoted* 0.963
vm1_ivar* 0.982
vm1_ivar_set* 1.010
vm1_length* 1.006
vm1_lvar_init* 0.938
vm1_lvar_set* 0.990
vm1_neq* 0.987
vm1_not* 1.013
vm1_rescue* 1.053
vm1_simplereturn* 1.030
vm1_swap* 1.017
vm1_yield* 1.032
vm2_array* 0.987
vm2_bigarray* 1.014
vm2_bighash* 0.987
vm2_case* 1.001
vm2_defined_method* 1.003
vm2_dstr* 0.997
vm2_eval* 0.982
vm2_method* 1.011
vm2_method_missing* 0.973
vm2_method_with_block* 1.027
vm2_mutex* 1.065
vm2_newlambda* 1.014
vm2_poly_method* 0.962
vm2_poly_method_ov* 0.972
vm2_proc* 1.058
vm2_raise1* 0.977
vm2_raise2* 0.990
vm2_regexp* 1.006
vm2_send* 1.005
vm2_struct_big_aref_hi* 1.005
vm2_struct_big_aref_lo* 1.010
vm2_struct_big_aset* 1.005
vm2_struct_small_aref* 1.030
vm2_struct_small_aset* 1.019
vm2_super* 0.900
vm2_unif1* 1.031
vm2_zsuper* 0.913
vm3_backtrace 1.004
vm3_clearmethodcache 0.937
vm3_gc 0.996
vm_thread_alive_check1 0.963
vm_thread_close 1.028
vm_thread_create_join 1.007
vm_thread_mutex1 1.047
vm_thread_mutex2 1.842
vm_thread_mutex3 1.028
vm_thread_pass 0.665
vm_thread_pass_flood 0.960
vm_thread_pipe 0.998
vm_thread_queue 0.995
Not so big change. vm2_super/zsuper should improve performance so I need to check it again.
Memory consumption¶
Runing this script to check process memory on Linux Ubuntu.
N = 100_000
$mod = true
$cls = true
module M
N.times{|i|
define_method("foo#{i}"){}
} if $mod
end
class C
include M
N.times{|i|
define_method("bar#{i}"){}
} if $cls
end
class D
include M
N.times{|i|
define_method("bar#{i}"){}
} if $cls
end
class E
include M
N.times{|i|
define_method("bar#{i}"){}
} if $cls
end
[C, D, E].each{|c|
obj = c.new
N.times{|i|
obj.send "foo#{i}" if $mod
obj.send "bar#{i}" if $cls
}
}
puts File.readlines('/proc/self/status').grep(/VmHWM/)
This program makes 100_000 methods for a module and classes.
Maybe it is too big example.
Making methods on classes and a module.
ruby 2.2
VmHWM: 247624 kB
trunk
VmHWM: 234004 kB
modified
VmHWM: 252236 kB
Making methods only on a module.
ruby 2.2
VmHWM: 77848 kB
trunk
VmHWM: 86452 kB
modified
VmHWM: 108756 kB
Making methods only on classes.
ruby 2.2
VmHWM: 175780 kB
trunk
VmHWM: 182944 kB
modified
VmHWM: 179216 kB
As you can see, first result shows 2% increase for memory usage compare to Ruby 2.2.
Second result shows 40% increase, but it is worst case.
Third result is best case (no methods in modules).
We need to check real usage.
Future work¶
I will try class level cache proposed by funnyfalcon before, over there.
Files
Updated by ko1 (Koichi Sasada) over 9 years ago
- Related to Bug #11279: remove rb_control_frame_t::klass added
Updated by ko1 (Koichi Sasada) over 9 years ago
- Status changed from Open to Closed
Applied in changeset r51126.
- method.h: introduce rb_callable_method_entry_t to remove
rb_control_frame_t::klass.
[Bug #11278], [Bug #11279]
rb_method_entry_t data belong to modules/classes.
rb_method_entry_t::owner points defined module or class.
module M
def foo; end
end
In this case, owner is M.
rb_callable_method_entry_t data belong to only classes.
For modules, MRI creates corresponding T_ICLASS internally.
rb_callable_method_entry_t can also belong to T_ICLASS.
rb_callable_method_entry_t::defined_class points T_CLASS or
T_ICLASS.
rb_method_entry_t data for classes (not for modules) are also
rb_callable_method_entry_t data because it is completely same data.
In this case, rb_method_entry_t::owner == rb_method_entry_t::defined_class.
For example, there are classes C and D, and incldues M,
class C; include M; end
class D; include M; end
then, two T_ICLASS objects for C's super class and D's super class
will be created.
When C.new.foo is called, then M#foo is searcheed and
rb_callable_method_t data is used by VM to invoke M#foo.
rb_method_entry_t data is only one for M#foo.
However, rb_callable_method_entry_t data are two (and can be more).
It is proportional to the number of including (and prepending)
classes (the number of T_ICLASS which point to the module).
Now, created rb_callable_method_entry_t are collected when
the original module M was modified. We can think it is a cache.
We need to select what kind of method entry data is needed.
To operate definition, then you need to use rb_method_entry_t.
You can access them by the following functions.- rb_method_entry(VALUE klass, ID id);
- rb_method_entry_with_refinements(VALUE klass, ID id);
- rb_method_entry_without_refinements(VALUE klass, ID id);
- rb_resolve_refined_method(VALUE refinements, const rb_method_entry_t *me);
To invoke methods, then you need to use rb_callable_method_entry_t
which you can get by the following APIs corresponding to the
above listed functions. - rb_callable_method_entry(VALUE klass, ID id);
- rb_callable_method_entry_with_refinements(VALUE klass, ID id);
- rb_callable_method_entry_without_refinements(VALUE klass, ID id);
- rb_resolve_refined_method_callable(VALUE refinements, const rb_callable_method_entry_t *me);
VM pushes rb_callable_method_entry_t, so that rb_vm_frame_method_entry()
returns rb_callable_method_entry_t.
You can check a super class of current method by
rb_callable_method_entry_t::defined_class.
- method.h: renamed from rb_method_entry_t::klass to
rb_method_entry_t::owner. - internal.h: add rb_classext_struct::callable_m_tbl to cache
rb_callable_method_entry_t data.
We need to consider abotu this field again because it is only
active for T_ICLASS. - class.c (method_entry_i): ditto.
- class.c (rb_define_attr): rb_method_entry() does not takes
defiend_class_ptr. - gc.c (mark_method_entry): mark RCLASS_CALLABLE_M_TBL() for T_ICLASS.
- cont.c (fiber_init): rb_control_frame_t::klass is removed.
- proc.c: fix `struct METHOD' data structure because
rb_callable_method_t has all information. - vm_core.h: remove several fields.
- rb_control_frame_t::klass.
- rb_block_t::klass.
And catch up changes.
- eval.c: catch up changes.
- gc.c: ditto.
- insns.def: ditto.
- vm.c: ditto.
- vm_args.c: ditto.
- vm_backtrace.c: ditto.
- vm_dump.c: ditto.
- vm_eval.c: ditto.
- vm_insnhelper.c: ditto.
- vm_method.c: ditto.
Updated by ko1 (Koichi Sasada) over 9 years ago
I committed this change. If you find any regression, please report about it.
I measured some applications with https://github.com/ko1/class_stat gem. This gem reports class/module/T_ICLASS usage.
For example, my rails app https://github.com/ko1/tracer_demo_rails_app:
total_klasses 6204
total_included 398
total_iclasses 979
total_methods 23539
total_dup 10149
In this case,
- there are 6,000 classes and modules.
- 400 modules are included (or prepended).
- 1,000 T_ICLASSes are created.
- 24,000 methods are defined.
- 10,000 methods can be duplicated by this patch.
Last line needs explanation.
Without this patch, each method has one rb_method_entry_t (VALUE).
However, this patch makes that methods of modules needs additional rb_callable_method_entry_t for each T_ICLASS.
Roughly, 10,000 objects can be allocated additionally in this case.
(rb_callable_method_entry_t for methods in modules are allocated when called, so it does not mean increasing 10,000 objects immediately)
Recently, I reduced one objects per methods in trunk.
In this case, 24,000 objects. So I decided increasing 10,000 objects is acceptable.
This is why I commit-ed it.
We need to consider how to cache rb_calllable_method_entry_t.
This is future work.
Updated by usa (Usaku NAKAMURA) almost 9 years ago
- Related to Bug #12164: Binding UnboundMethod to BasicObject added