Project

General

Profile

Actions

Feature #11158

closed

Introduce a Symbol.count API as a more efficient alternative to Symbol.all_symbols.size

Added by methodmissing (Lourens Naudé) over 9 years ago. Updated over 9 years ago.

Status:
Closed
Target version:
-
[ruby-core:<unknown>]

Description

We're in the process of migrating a very large Rails codebase from a Ruby 2.1.6 runtime to Ruby 2.2.2 and as part of this migration process would like to keep track of Symbol counts and Symbol GC efficiency in our metrics system. Preferably still while on 2.1 (however this implies a backport to 2.1 as well), but would definitely be useful in 2.2 as well.

Currently the recommended and only reliable way to get to the Symbol counts is via Symbol.all_symbols.size, which:

  • Allocates an Array
  • rb_ary_push and walking the symbol table isn't exactly efficient

Here's some benchmarks:

./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007f8bc208bdd0 @label="", @real=0.0011274919961579144, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>
./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.all_symbols.size } }"
#<Benchmark::Tms:0x007fa47205a550 @label="", @real=0.3135859479953069, @cstime=0.0, @cutime=0.0, @stime=0.03, @utime=0.29, @total=0.31999999999999995>

I implemented and attached a patch for a simple Symbol.count API that just returns a numeric version of the symbol table size, without having to do any iteration.

Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)


Files

symbol_count.patch (4.4 KB) symbol_count.patch Symbol.count patch file methodmissing (Lourens Naudé), 05/16/2015 04:12 AM
symbol_enumerator.patch (6.07 KB) symbol_enumerator.patch Symbol.each methodmissing (Lourens Naudé), 05/21/2015 02:14 AM

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #9963: Symbol.countFeedback06/19/2014Actions
Actions #1

Updated by nobu (Nobuyoshi Nakada) over 9 years ago

Lourens Naudé wrote:

Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)

New features are never backported to 2.2 or earlier.

Actions #2

Updated by methodmissing (Lourens Naudé) over 9 years ago

Makes sense, my bad, thanks for the consideration.

Actions #3

Updated by marcandre (Marc-Andre Lafortune) over 9 years ago

  • Assignee set to matz (Yukihiro Matsumoto)

I'd recommend instead to introduce Symbol.each, which would accept a block and return an Enumerable when none is given.

Symbol.each.size would be then be an efficient (lazy) way of getting the number of symbols, and it would be a more versatile method in case someone wants to iterate on all Symbols for other purposes

Actions #4

Updated by methodmissing (Lourens Naudé) over 9 years ago

Sounds good, I'll take a stab tonight.

Actions #5

Updated by methodmissing (Lourens Naudé) over 9 years ago

Please find attached the changes as per Marc-Andre's suggestions. Exposes Symbol.each and extends with Enumerable

  def test_each
    x = Symbol.each.size
    assert_kind_of(Fixnum, x)
    assert_equal x, Symbol.all_symbols.size
    assert_equal x, Symbol.count
    assert_equal Symbol.to_a, Symbol.all_symbols
    answer_to_life = :bacon_lettuce_tomato
    assert_equal [:bacon_lettuce_tomato], Symbol.grep(/bacon_lettuce_tomato/)
  end

Calling size on the enumerator is super efficient.

$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.each.size } }"
#<Benchmark::Tms:0x007fea32039688 @label="", @real=0.005798012993182056, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>

Symbol.count isn't though (not sure if it's possible to replace the definition with Symbol.each.size instead)

$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007fa47907afb0 @label="", @real=0.36278180500085, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.36, @total=0.36>

Thoughts?

Actions #6

Updated by akr (Akira Tanaka) over 9 years ago

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Assignee changed from matz (Yukihiro Matsumoto) to ko1 (Koichi Sasada)

Updated by cesario (Franck Verrot) over 9 years ago

Lourens Naudé wrote:

Please find attached the changes as per Marc-Andre's suggestions. Exposes Symbol.each and extends with Enumerable

Hi Lourens,

I'm not sure to fully understand why we make Symbol extend Enumerable rather than returning a new enumerator object (probably also extending Enumerable) ? Isn't there way to much overhead to include Enumerable in Symbol?

Thoughts?

Nice work!

Updated by ko1 (Koichi Sasada) over 9 years ago

I don't against introduce Symbol.each for shortcut of Symbol.all_symbols.each.

However, For measurement purpose, we should introduce new measurement API into ObjectSpace because they have several types.

    |immortal | mortal

--------+:-------:+:------:
static | (1) | (2)
dynamic | (3) | (4)

  • Immortal symbols
    • Static immortal symbols (1)
    • Dynamic immortal symbols (3)
  • Dynamic mortal symbols (4)

There are no (2) type symbols.

Current Symbol.all_symbols.size returns (1) + (3) + (4).
Maybe the number of (1) and (2) (or (1+2)) will be helpful for some kind of people who want to know details.

Updated by methodmissing (Lourens Naudé) over 9 years ago

Thanks for the feedback - I'll take a stab and circle back.

Updated by marcandre (Marc-Andre Lafortune) over 9 years ago

Franck Verrot wrote:

I'm not sure to fully understand why we make Symbol extend Enumerable rather than returning a new enumerator object

It's not "rather than". Symbol.each without a block will return an Enumerator, that we extend Enumerable or not.

Isn't there way to much overhead to include Enumerable in Symbol?

Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to Symbol, and many won't be helpful (I doubt someone would use Symbol.map{...}, but I 'm not sure I see the downside.

Updated by cesario (Franck Verrot) over 9 years ago

Marc-Andre Lafortune wrote:

Franck Verrot wrote:

Isn't there way to much overhead to include Enumerable in Symbol?

Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to Symbol, and many won't be helpful (I doubt someone would use Symbol.map{...}, but I 'm not sure I see the downside.

Sorry I haven't formulated this right :-) I was only wondering if including Enumerable in Symbol could lead some of us to rely on methods (like map as you said) that weren't really thought through at the time we introduced each. Maybe that doesn't make sense, so feel free to ignore this comment... still new to the Ruby VM internals and ways of designing its APIs :-)

Thanks!

Actions #13

Updated by ko1 (Koichi Sasada) over 9 years ago

  • Status changed from Open to Closed

Applied in changeset r51654.


  • ext/objspace/objspace.c: add a new method ObjectSpace.count_symbols.
    [Feature #11158]
  • symbol.c (rb_sym_immortal_count): added to count immortal symbols.
  • symbol.h: ditto.
  • test/objspace/test_objspace.rb: add a test for this method.
  • NEWS: describe about this method.
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0