Feature #11158
closedIntroduce a Symbol.count API as a more efficient alternative to Symbol.all_symbols.size
Description
We're in the process of migrating a very large Rails codebase from a Ruby 2.1.6 runtime to Ruby 2.2.2 and as part of this migration process would like to keep track of Symbol counts and Symbol GC efficiency in our metrics system. Preferably still while on 2.1 (however this implies a backport to 2.1 as well), but would definitely be useful in 2.2 as well.
Currently the recommended and only reliable way to get to the Symbol counts is via Symbol.all_symbols.size, which:
- Allocates an Array
- rb_ary_push and walking the symbol table isn't exactly efficient
Here's some benchmarks:
./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007f8bc208bdd0 @label="", @real=0.0011274919961579144, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>
./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.all_symbols.size } }"
#<Benchmark::Tms:0x007fa47205a550 @label="", @real=0.3135859479953069, @cstime=0.0, @cutime=0.0, @stime=0.03, @utime=0.29, @total=0.31999999999999995>
I implemented and attached a patch for a simple Symbol.count API that just returns a numeric version of the symbol table size, without having to do any iteration.
Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)
Files
Updated by nobu (Nobuyoshi Nakada) over 9 years ago
Lourens Naudé wrote:
Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)
New features are never backported to 2.2 or earlier.
Updated by methodmissing (Lourens Naudé) over 9 years ago
Makes sense, my bad, thanks for the consideration.
Updated by marcandre (Marc-Andre Lafortune) over 9 years ago
- Assignee set to matz (Yukihiro Matsumoto)
I'd recommend instead to introduce Symbol.each
, which would accept a block and return an Enumerable
when none is given.
Symbol.each.size
would be then be an efficient (lazy) way of getting the number of symbols, and it would be a more versatile method in case someone wants to iterate on all Symbols for other purposes
Updated by methodmissing (Lourens Naudé) over 9 years ago
Sounds good, I'll take a stab tonight.
Updated by methodmissing (Lourens Naudé) over 9 years ago
- File symbol_enumerator.patch symbol_enumerator.patch added
Please find attached the changes as per Marc-Andre's suggestions. Exposes Symbol.each
and extends with Enumerable
def test_each
x = Symbol.each.size
assert_kind_of(Fixnum, x)
assert_equal x, Symbol.all_symbols.size
assert_equal x, Symbol.count
assert_equal Symbol.to_a, Symbol.all_symbols
answer_to_life = :bacon_lettuce_tomato
assert_equal [:bacon_lettuce_tomato], Symbol.grep(/bacon_lettuce_tomato/)
end
Calling size on the enumerator is super efficient.
$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.each.size } }"
#<Benchmark::Tms:0x007fea32039688 @label="", @real=0.005798012993182056, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>
Symbol.count
isn't though (not sure if it's possible to replace the definition with Symbol.each.size
instead)
$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007fa47907afb0 @label="", @real=0.36278180500085, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.36, @total=0.36>
Thoughts?
Updated by akr (Akira Tanaka) over 9 years ago
- Related to Feature #9963: Symbol.count added
Updated by ko1 (Koichi Sasada) over 9 years ago
- Assignee changed from matz (Yukihiro Matsumoto) to ko1 (Koichi Sasada)
Updated by cesario (Franck Verrot) over 9 years ago
Lourens Naudé wrote:
Please find attached the changes as per Marc-Andre's suggestions. Exposes
Symbol.each
and extends withEnumerable
Hi Lourens,
I'm not sure to fully understand why we make Symbol
extend Enumerable
rather than returning a new enumerator object (probably also extending Enumerable
) ? Isn't there way to much overhead to include Enumerable
in Symbol
?
Thoughts?
Nice work!
Updated by ko1 (Koichi Sasada) over 9 years ago
I don't against introduce Symbol.each for shortcut of Symbol.all_symbols.each.
However, For measurement purpose, we should introduce new measurement API into ObjectSpace because they have several types.
|immortal | mortal
--------+:-------:+:------:
static | (1) | (2)
dynamic | (3) | (4)
- Immortal symbols
- Static immortal symbols (1)
- Dynamic immortal symbols (3)
- Dynamic mortal symbols (4)
There are no (2) type symbols.
Current Symbol.all_symbols.size returns (1) + (3) + (4).
Maybe the number of (1) and (2) (or (1+2)) will be helpful for some kind of people who want to know details.
Updated by methodmissing (Lourens Naudé) over 9 years ago
Thanks for the feedback - I'll take a stab and circle back.
Updated by marcandre (Marc-Andre Lafortune) over 9 years ago
Franck Verrot wrote:
I'm not sure to fully understand why we make
Symbol
extendEnumerable
rather than returning a new enumerator object
It's not "rather than". Symbol.each
without a block will return an Enumerator
, that we extend Enumerable
or not.
Isn't there way to much overhead to include
Enumerable
inSymbol
?
Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to Symbol
, and many won't be helpful (I doubt someone would use Symbol.map{...}
, but I 'm not sure I see the downside.
Updated by cesario (Franck Verrot) over 9 years ago
Marc-Andre Lafortune wrote:
Franck Verrot wrote:
Isn't there way to much overhead to include
Enumerable
inSymbol
?Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to
Symbol
, and many won't be helpful (I doubt someone would useSymbol.map{...}
, but I 'm not sure I see the downside.
Sorry I haven't formulated this right :-) I was only wondering if including Enumerable
in Symbol
could lead some of us to rely on methods (like map
as you said) that weren't really thought through at the time we introduced each
. Maybe that doesn't make sense, so feel free to ignore this comment... still new to the Ruby VM internals and ways of designing its APIs :-)
Thanks!
Updated by ko1 (Koichi Sasada) over 9 years ago
- Status changed from Open to Closed
Applied in changeset r51654.
- ext/objspace/objspace.c: add a new method ObjectSpace.count_symbols.
[Feature #11158] - symbol.c (rb_sym_immortal_count): added to count immortal symbols.
- symbol.h: ditto.
- test/objspace/test_objspace.rb: add a test for this method.
- NEWS: describe about this method.