Eric Wong wrote:
malloc_usable_size shows up at or near the top of many profiles for me.
We can check the performance impact by enabling HAVE_MALLOC_USABLE_SIZE macro.
Try this program:
require 'benchmark'
Benchmark.bm{|x|
10.times{
x.report{
10_000_000.times{ '*' * 260 }
}
}
}
Results:
enable HAVE_MALLOC_USABLE_SIZE:
user system total real
3.140000 0.010000 3.150000 ( 3.144509)
3.130000 0.000000 3.130000 ( 3.132572)
3.130000 0.000000 3.130000 ( 3.138391)
3.170000 0.000000 3.170000 ( 3.169465)
3.150000 0.000000 3.150000 ( 3.160397)
3.140000 0.000000 3.140000 ( 3.134969)
3.150000 0.000000 3.150000 ( 3.150813)
3.130000 0.000000 3.130000 ( 3.134149)
3.190000 0.000000 3.190000 ( 3.196077)
3.130000 0.000000 3.130000 ( 3.132020)
disable HAVE_MALLOC_USABLE_SIZE:
user system total real
2.940000 0.010000 2.950000 ( 2.965000)
2.950000 0.000000 2.950000 ( 2.953580)
2.940000 0.000000 2.940000 ( 2.952080)
2.970000 0.000000 2.970000 ( 2.964060)
2.970000 0.000000 2.970000 ( 2.970614)
3.020000 0.000000 3.020000 ( 3.023180)
3.000000 0.000000 3.000000 ( 3.006568)
3.000000 0.000000 3.000000 ( 2.993302)
2.940000 0.000000 2.940000 ( 2.951028)
2.980000 0.000000 2.980000 ( 2.987452)
(Ubuntu 14.1 on VirtualBox on Windows7)
It seems about 3.1 sec ->2.9 sec (7% speedup) by disabling this feature for an intentional case.
We should be able to use ruby_sized_xfree in more places; especially
if rb_data_type_t->dsize is defined.
One possible improvement is to allow the rb_data_type_t->dsize pointer
to be a FIXNUM, removing the need for some memsize functions.
T_DATA is not so many, so that I don't think the overhead of calling function is matter.
BTW, as nobu said, the function pointers can be located odd address. I encounterd such case on mswin32 build.
Furthermore, over-accounting malloc-ed bytes (presumably the reason
malloc_usable_size was introduced). should be less harmful nowadays with
incremental marking.
At first, the purpose of using malloc_usable_size() is to measure malloc'ed block correctly.
As you know, malloc() can return bigger memory block than we specified.
We need to compare the advantage (preciseness) and the disadvantage (performance down).
And I also agree that performance is more important :)
Current malloc_increase (and oldmalloc_increase) is not considered carefully.
(Current implementation is ad-hoc)
We need to re-consider about it (include avoiding atomic operations).