Project

General

Profile

Actions

Feature #9118

closed

In Enumerable#to_a, use size to set array capa when possible

Added by HonoreDB (Aaron Weiner) over 10 years ago. Updated almost 9 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:58378]

Description

Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity, then populating it and expanding the capacity as it goes. For large enumerables, this causes several resizes, which can hurt performance. When an enumerable exposes a size method, we can guess that the resulting array's size will usually be equal to the enumerable's size. If we're right, we only have to set capacity once, and if we're wrong, we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take advantage of the size method when it's there. In my tests this makes Range#to_a about 10% faster, and doesn't have any significant effect on a vanilla enum with no size method. I couldn't find any existing benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with custom to_a, like Hash.


Files

enum.c (72.4 KB) enum.c enum.c with modified enum_to_a HonoreDB (Aaron Weiner), 11/16/2013 11:09 PM

Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #11130: Re: [ruby-changes:38376] glass:r50457 (trunk): * enum.c (enum_to_a): Use size to set array capa when possible.ClosedGlass_saga (Masaki Matsushita)Actions

Updated by Hanmac (Hans Mackowiak) over 10 years ago

enum.size can return Float::Infinity maybe for [1,2,3].cycle.size you need to check that too

Updated by HonoreDB (Aaron Weiner) over 10 years ago

Ah, right! This seems like an opportunity to improve on existing behavior: right now that just silently hangs forever. Do you think we should warn, then hang, or just raise? I'd lean towards the warn because it's possible size is returning the wrong thing.

Updated by mame (Yusuke Endoh) over 10 years ago

I think the proposal will break the compatibility of the following code:

class C
include Enumerable
def size
to_a.size
end
def each
end
end
C.new.size #=> expected: 0, with the proposal: stack level too deep

Examples in the wild:

In addition, #each and #size does not necessarily have a common semantics.
In fact, IO#each yields strings in lines, but IO#size returns a count in bytes.

--
Yusuke Endoh

Updated by HonoreDB (Aaron Weiner) over 10 years ago

It definitely breaks that usage, but that's bad usage--we're supposed to use Enumerable#count for that, not size.

In cases where size doesn't correctly predict the array, this doesn't really break anything, it just switches out one bad guess at capa for another.

Updated by Hanmac (Hans Mackowiak) over 10 years ago

Enumerable#count may not a good idea, better would be Enumerator#size

Actions #6

Updated by usa (Usaku NAKAMURA) almost 9 years ago

  • Related to Bug #11130: Re: [ruby-changes:38376] glass:r50457 (trunk): * enum.c (enum_to_a): Use size to set array capa when possible. added
Actions #7

Updated by Anonymous almost 9 years ago

  • Status changed from Open to Closed

Applied in changeset r50483.


  • enum.c (enum_to_a): revert r50457.
    it requires recursion check.
    then, it doesn't make performance improvement.
    [Bug #11130] [Feature #9118]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0