Feature #6808

Implicit index for enumerations

Added by Thomas Sawyer over 1 year ago. Updated over 1 year ago.

[ruby-core:46834]
Status:Feedback
Priority:Normal
Assignee:-
Category:core
Target version:next minor

Description

=begin
One of the less lovely things about Ruby's otherwise elegant enumerables is the lack of ubiquitous access to the current index. Because of this, we end up with a bevy of extra methods that are little more than counter parts and compensation for other enumerable methods to gain access to the index. Examples include, #eachwithindex, #eachindex and (in many extension libraries) #collectwithindex. It is all rather wasteful, inelegant, and limiting. Heaven forbid we need a #selectwith_index, or some other uncommon case.

No doubt this has had some discussion long in the past, but I would like revisit and offer a bnew more concrete proposal...

Thanks to Enumerator, we can now at least do:

[:a,:b,:c].eachwithindex.map{ |e, i| [i, e] }

That's great, but it has obvious shortcomings. It's long winded and it has the overhead of an Enumerator object. Ideally we would want to do this instead:

[:a,:b,:c].map{ |e| [$i, e] }

Where $i is the implicit index. Now a global variable is surely the simplest solution. But, I can understand that some might object to the use of a global variable, despite the fact that this approach is common with regexp matches like $1, $2, etc. In that case, we could designate a new keyword. Lets call it index.

[:a,:b,:c].map{ |e| [index, e] }

We might suffer a conflict here however if someone has already used "index" as a block argument. In that case we would need Ruby to allow it to be overridden, in the same sense that one can define a public method called class, even though class is a keyword in other contexts.

If this were all that we gained then I say it is a victory, but I'd like to consider also that we go a step further, and instead of having just "index", we have an iterative object. After all Ruby is an OOPL. In this case, the keyword would be it and we could do:

[:a,:b,:c].map{ |e| [it.index, e] }

The nice thing about it is that it can have a few other useful methods to improve readability of code, such as it.first? and it.last? (if size is known for the enumerable). I think this is awesome solution that grants the most readability and flexibility to the language.

Of course, having an iteration object might bring up concerns about performance, since it will add overhead to create a new iteration instance with every pass. This can be addressed by having the object be mutable, so all that needs to change is the index in the same object. A minor downside here, an it can't be stored by reference between passes (e.g. prev_it = it), but knowing this, #dup could be used if that was really necessary. If that isn't good enough to curb performance concerns, I would suggest a means of indicating the it object be made available. We don't want to drag Enumerator into this so map.it{...} is not the solution, but perhaps Ruby could recognize ;it at the end of block arguments?

[:a,:b,:c].map{ |e; it| [it.index, e] }

Maybe that syntax can't work, but surely something along these lines could. Personally, I doubt the overhead of mutable it is too much, but just in case.

To summarize, I propose an implicit mutable iteration object called it that allows access to the enumerations index, plus convenience methods for querying the index. Or, if that is considered too much, then at least an implicit index, either as a global variable or a special keyword. Any of these choices would be a marked improvement, allowing us to avoid the endless proliferation of _with_index methods.
=end

History

#1 Updated by Thomas Sawyer over 1 year ago

Ixnay on the global variable. It was just pointed out to me that it would be problem for nested iterations. Keyword could still work though b/c it would be block local.

#2 Updated by Clifford Heath over 1 year ago

The inconsistency between operations over Arrays and Hashes isn't always avoidable, but consider:

h = Hash[1, 2, 3, 4, 5, 6]
a = [1, 2, 3, 4, 5, 6]

h[3]
=> 4
h.detect{|k, v| k == 3 }
#=> [3, 4]

a[3]
#=> 3
a.detect{|v| v == 3 }
#=> 3

Surely the block passed to Array#detect should receive the index, and
the result of a Hash#detect should be a value, not a [key, value] pair?

I don't expect this to be changed, but perhaps it might inform proposed changes.

Clifford Heath.

#3 Updated by Eric Hodel over 1 year ago

What about the overhead for an infinite enumerator that does not use the implicit index? Especially after a few days of CPU time for a frequently-used enumerator?

#4 Updated by Thomas Sawyer over 1 year ago

@drbrain It's a fair point. But I imagine the index already exists under the hood. Probably the question is, can it be made accessible with little overhead?

Perhaps it could even be partially lazy, so full overhead doesn't even come into play unless used.

#5 Updated by Eric Hodel over 1 year ago

There is no such index internal to Enumerator to make accessible.

eachwithindex already exists to do this. The use of eachwithindex is preferable to an implicit, lazy iteration count. The former is intention-revealing while the latter is not.

For this new feature a user may ask, "Where does this local variable come from? I didn't assign to it!" or, "Why was my 'it' (or 'index') variable overwritten with a number?" eachwithindex does not have either of these problems.

#6 Updated by Thomas Sawyer over 1 year ago

There is no such index internal to Enumerator to make accessible.

It not about Enumerator, its about #each itself. And there certainly is and index with #each method itself. e.g. https://github.com/ruby/ruby/blob/trunk/array.c#L1549

#eachwithindex doesn't cut it at all, as it doesn't address all the other possibilities, #mapwithindex, #selectwithindex, etc., which is the point.

I really don't see the problem with "I didn't assign that". If its spec its expected, and we already have this sort of thing with regexp globals.

#7 Updated by Anonymous over 1 year ago

On Jul 30, 2012, at 6:59 PM, trans (Thomas Sawyer) transfire@gmail.com wrote:

Issue #6808 has been updated by trans (Thomas Sawyer).

There is no such index internal to Enumerator to make accessible.

It not about Enumerator, its about #each itself. And there certainly is and index with #each method itself. e.g. https://github.com/ruby/ruby/blob/trunk/array.c#L1549

But that's only each for arrays. I doubt an each method for, say, linked lists would bother to implement an internal index.

#eachwithindex doesn't cut it at all, as it doesn't address all the other possibilities, #mapwithindex, #selectwithindex, etc., which is the point.

I'm not clear why the #with_index method isn't adequate for your needs.

[1,2,3].map.with_index { |item, index| [item, index] }

--
-- Jim Weirich
-- jim.weirich@gmail.com

#8 Updated by Thomas Sawyer over 1 year ago

=begin

I'm not clear why the #with_index method isn't adequate for your needs.

Yes, functionally your are right. I was just thinking that the overall overhead would be less if enumerator wasn't used. I ran some benchmarks: https://gist.github.com/3213779

# EACH user system total real
each 3.840000 0.010000 3.850000 ( 3.843590)
enumerator each 5.130000 0.020000 5.150000 ( 5.156704)
eachwithindex 5.650000 0.020000 5.670000 ( 5.662425)
each and manual index 5.190000 0.010000 5.200000 ( 5.206394)
enumerator each.with_index 6.500000 0.020000 6.520000 ( 6.519111)
enumerator each and manual index 6.480000 0.020000 6.500000 ( 6.501582)

# MAP user system total real
map 5.210000 0.020000 5.230000 ( 5.230273)
enumerator map 9.450000 0.040000 9.490000 ( 9.491262)
map and manual index 6.600000 0.020000 6.620000 ( 6.629977)
enumerator map.with_index 8.270000 0.030000 8.300000 ( 8.291350)
enumerator map and manual index 11.210000 0.050000 11.260000 ( 11.256445)

Notice the results of the manual index without the enumerator --and that's in Ruby, not C code.
=end

#9 Updated by Eric Hodel over 1 year ago

trans (Thomas Sawyer) wrote:

I'm not clear why the #with_index method isn't adequate for your needs.

Yes, functionally your are right. I was just thinking that the overall overhead would be less if enumerator wasn't used. I ran some benchmarks: https://gist.github.com/3213779

[…]

Notice the results of the manual index without the enumerator --and that's in Ruby, not C code.

Are the times of these benchmarks dominated by object creation or iteration? What happens if you run a small number of trials across a large array? (n = 26, a = (0...1000000).to_a)

No matter which method is faster, what happens to this code:

index = 10
offsets.each do |e|
index = e if condition e
break if index > 30
end

Does index equal 10 on the first execution of the block? Does it equal 0?

#10 Updated by Thomas Sawyer over 1 year ago

=begin

Are the times of these benchmarks dominated by object creation or iteration? What happens if you run a small number of trials across a large array? (n = 26, a = (0...1000000).to_a)

You are correct that the difference would be less for large arrays and few iterations.

# EACH user system total real
each 3.610000 0.050000 3.660000 ( 3.671551)
enumerator each 3.610000 0.030000 3.640000 ( 3.642515)
eachwithindex 4.920000 0.020000 4.940000 ( 4.972732)
each and manual index 4.930000 0.010000 4.940000 ( 4.950868)
enumerator each.with_index 4.950000 0.000000 4.950000 ( 4.982888)
enumerator each and manual index 4.900000 0.000000 4.900000 ( 4.911986)

# MAP user system total real
map 4.230000 0.080000 4.310000 ( 4.324616)
enumerator map 6.060000 0.090000 6.150000 ( 6.176046)
map and manual index 5.540000 0.070000 5.610000 ( 5.633096)
enumerator map.with_index 5.510000 0.060000 5.570000 ( 5.568634)
enumerator map and manual index 7.090000 0.200000 7.290000 ( 7.287555)

But the difference looks less pronounced in this case, and on average I think programs tend to create and iterate over more small arrays, then they do large ones.

No matter which method is faster, what happens to this code:
...
Does index equal 10 on the first execution of the block? Does it equal 0?

That's a fair question. I think to preserve backward compatibility, this code would have to behave just as you present it. In other words, the implicit index has been overridden by assigning it as a local variable. Which is why originally a global seemed the right choice. But can a global behave block local?

In any case, I think I will withdraw this request. Having to worry about local override or managing global that behaves block local will probably dry up any performance gain. And in retrospect I think the whole it idea, while good on it's face, doesn't really solve the issues it is intended to well.
I'm glad to have had the chance to discuss this and flush it out though, as it has been sitting in the back of my mind for a while.
=end

#11 Updated by Eric Hodel over 1 year ago

  • Status changed from Open to Closed

trans (Thomas Sawyer) wrote:

No matter which method is faster, what happens to this code:
...
Does index equal 10 on the first execution of the block? Does it equal 0?

That's a fair question. I think to preserve backward compatibility, this code would have to behave just as you present it. In other words, the implicit index has been overridden by assigning it as a local variable. Which is why originally a global seemed the right choice. But can a global behave block local?

$~ and friends behave this way (thread and method local).

In any case, I think I will withdraw this request. Having to worry about local override or managing global that behaves block local will probably dry up any performance gain. And in retrospect I think the whole it idea, while good on it's face, doesn't really solve the issues it is intended to well.

OK.

#12 Updated by Thomas Sawyer over 1 year ago

$~ and friends behave this way (thread and method local).

Hmmm... In that case, maybe it is worth trying, to see what the actual performance change would be. I'm willing to do it, and I basically know enough to work with a global variable in the C code. But how to handle block local behavior?

#13 Updated by Eric Hodel over 1 year ago

  • Status changed from Closed to Feedback

See rbdefinevirtualvariable() and vmsvar_get()

#14 Updated by Yutaka HARA over 1 year ago

  • Target version changed from 2.0.0 to next minor

Also available in: Atom PDF