Project

General

Profile

Actions

Feature #20049

closed

Destructive drop_while for Array and Hash

Added by chucke (Tiago Cardoso) 5 months ago. Updated 4 months ago.

Status:
Feedback
Assignee:
-
Target version:
-
[ruby-core:115648]

Description

I propose a "drop_while!" variant for arrays and hashes, which changes the current instance.

h = {foo: 0, bar: 1, baz: 2}
h.drop_while!{|element| key, value = *element; value < 2 }
h #=> # => { baz: 2 }

Updated by matheusrich (Matheus Richard) 5 months ago

What would be the return value? The same as drop_while?

Updated by chucke (Tiago Cardoso) 5 months ago

I'd say it returns self, or nil if no changes were made, just like other ! Methods of the kind.

Updated by matz (Yukihiro Matsumoto) 4 months ago

  • Status changed from Open to Feedback

I reject Hash#drop_while! because the order is not (and should not be) important for Hash and the order of hash is fragile anyway.
Whereas Array#drop_while! has chance. Could you explain the use-case for the new method, please?

Matz.

Updated by zverok (Victor Shepelev) 4 months ago

Could you explain the use-case for the new method, please?

Not the original author, but I think one of the frequent use cases is some parser- or scanner-alike code. E.g., after reading some "dirty" data,

lines.drop_while! { ln.match?(/HEADER:/) }

# operate with clean `lines` here

(I had more generic take on this in my ticket about "consuming" enumerators #19061, which I haven't had time to pursue the better proposal/explanation this year, but want to return during the next one.)

Updated by chucke (Tiago Cardoso) 4 months ago

thx for the response matz!

The specific use case can be found here: https://gitlab.com/os85/http-2-next/-/blob/master/lib/http/2/next/connection.rb#L737-739

The HTTP/2 allows frames to be ackowledged for streams that have been closed "recently", which in this case means "last 15 seconds". functionally, I'm using an hash to store them, with insertion at the time they're closed, and in a given event, I proceed to clean up the ones that have timed-out.

the first implementation traversed the whole collection for all timed out streams, and removed them. This O(n) complexity surfaced as a bottleneck in some benchmarks around long-lived connections. In order to improve it, I use #drop_while, which turned it into O(log n) , as given hash preserving insertion order, possibly timed out streams are left-most, so I can stop processing the collection as soon as I find a non-timed out element.

This greatly improved benchmarks, but the resulting multiple objects generated by the successive Enumerable#drop_while + #to_h calls resulted in a new bottleneck associated with temporary object generation, that could be greatly reduced by just reusing the same collection.

can you clarify what you mean by "the order of hash is fragile anyway"? As I understood, there are other stdlib data structures relying on hash order insertion. Are there plans to remove this guarantee?

I guess I could rely on Array#drop_while!, although that'd hurt lookups.

Updated by chucke (Tiago Cardoso) 4 months ago

for comparison, the original implementation can be found here: https://github.com/igrigorik/http-2/blob/master/lib/http/2/connection.rb#L705-L714 (http-2-next is a fork of http-2 gem).

FWIW zverok's use case is also reasonably common.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0