Feature #21821
openAdd Stack and SizedStack classes
Description
Context¶
Queue and SizedQueue are very useful and performant constructs, however they only allow for FIFO queues.
Some use cases do call for LIFO queues AKA stacks.
For instance, in the case of connection pool, it's often preferable to use a stack.
Feature¶
I'd like to introduce Stack and SizedStack classes, to mirror Queue and SizedQueue.
They'd have exactly the same method and behavior at the exception of dequeuing order.
Thread namespace?¶
Currently Queue and SizedQueue are technically defined under Thread and aliased in Object.
I wonder if that's something we should do for Stack too, or just some historical thing we shouldn't perpetuate.
Updated by byroot (Jean Boussier) about 2 months ago
- Related to Feature #21721: Allow `Queue` and `SizedQueue` to be used as LIFO queues added
Updated by nobu (Nobuyoshi Nakada) about 2 months ago
byroot (Jean Boussier) wrote:
Currently
QueueandSizedQueueare technically defined underThreadand aliased inObject.
I wonder if that's something we should do forStacktoo, or just some historical thing we shouldn't perpetuate.
I think it will depend on what you achieve with Stack.
If you want to use it for inter thread communication, Thread namespace should be suitable.
Otherwise, if you want just simple push/pop operations, why isn't Array enough?
Updated by byroot (Jean Boussier) about 2 months ago
I think it will depend on what you achieve with Stack.
Yes, the point is to use it in cases where synchronization is needed, hence Array is not suitable. Typically connection pools: https://github.com/mperham/connection_pool/blob/8ba2715edca69d6162fd4f798997b74dd64e9529/lib/connection_pool/timed_stack.rb
Updated by ko1 (Koichi Sasada) about 1 month ago
I heard that C++ has deque: Double-Ended Queue https://en.cppreference.com/w/cpp/container/deque.html
- The sound of
Dequeis similar todequeueordeq(operation names). so introducingDequeueseems not good idea. - If we assume the implementation of
Queueis based onDequeu, addingpop_back()(LIFO operation) seems reasonalbe, and easy to introduce than introducing new classes, and good aligned to Ruby's large-class philosophy. I want to ask Matz again.
I'm afraid that expanding the similar idea like Thread::Heap, Thread::PriorityQueue and so on, atomic containers for each data structure.
Updated by Eregon (Benoit Daloze) about 1 month ago
· Edited
ko1 (Koichi Sasada) wrote in #note-4:
- If we assume the implementation of
Queueis based onDequeu, addingpop_back()(LIFO operation) seems reasonalbe, and easy to introduce than introducing new classes, and good aligned to Ruby's large-class philosophy. I want to ask Matz again.
This would make the implementation of Queue (based on Deque or not) more complicated and less efficient, as I said in the last paragraph of https://bugs.ruby-lang.org/issues/21721#note-2.
I think we should either:
- Add nothing because this is not a common need and there are 2 reasonable alternatives: https://bugs.ruby-lang.org/issues/21721#note-2 and https://bugs.ruby-lang.org/issues/21721#note-18
- Add Thread::Stack & Thread::SizedStack. I think it's not good to alias them in
Objectbecause they should not be used if the stack is used by a single Thread, Array should be used instead (as @nobu (Nobuyoshi Nakada) already pointed out).
I'm afraid that expanding the similar idea like
Thread::Heap,Thread::PriorityQueueand so on, atomic containers for each data structure.
Yes, it starts to feel like Java. I think there are not enough use cases for concurrent stacks to deserve to be core.
It could be in concurrent-ruby perhaps, or maybe the 2 alternatives linked above are good enough.
Updated by Eregon (Benoit Daloze) about 1 month ago
From a quick look the closest thing to a concurrent Stack in Java is LinkedBlockingDeque.
That implementation uses a single lock and condition variables, so it's not particularly efficient and likely causes significant contention.
In comparison LinkedBlockingQueue uses two locks (basically one for push, one for pop) to avoid contention, unless the queue is completely empty and then both locks need to be taken.
That illustrates my point that having a Deque limits the implementation choices and optimizations significantly.
Updated by Eregon (Benoit Daloze) about 1 month ago
BTW #num_waiting is always tricky to implement so if we add new classes I would like to not have that, because it's always messy to implement and makes it e.g. impossible to reuse JDK classes for JRuby & TruffleRuby.
Updated by byroot (Jean Boussier) about 1 month ago
there are 2 reasonable alternatives
About the second one, it's clever but not sure if ideal because it's no longer "atomic" on MRI, so is subject to Thread#raise. Looking at the connection_pool gem, one thing that slows it down considerably is the numerous calls to Thread.handle_interrupt to avoid leaking connections when Timeout is used.
Updated by Eregon (Benoit Daloze) about 1 month ago
· Edited
byroot (Jean Boussier) wrote in #note-8:
About the second one, it's clever but not sure if ideal because it's no longer "atomic" on MRI, so is subject to
Thread#raise. Looking at theconnection_poolgem, one thing that slows it down considerably is the numerous calls toThread.handle_interruptto avoid leaking connections whenTimeoutis used.
I see https://github.com/mperham/connection_pool/blob/f3645821d02fe8de5089f31b61600a1f6217afb1/lib/connection_pool.rb#L64-L73
There is lots of code in checkin/checkout anyway, those Thread.handle_interrupt can't be removed even if SizedStack exists.
Even if it's a single method call directly to a SizedStack method, that could still check interrupts, and it definitely would for pop, so you'd still need the Thread.handle_interrupt.
On that topic maybe we should have all ensure automatically be Thread.handle_interrupt(Object => :never) (https://bugs.ruby-lang.org/issues/17849#note-5), or a way to opt-in per ensure. Though it wouldn't cover checkout then, which I'm unsure if OK or not. Maybe a way to mark a method as uninterruptible then for efficiency (but obviously problematic if misused).
BTW it doesn't seem great to have checkout in Thread.handle_interrupt(Object => :never) because then that thread can't be interrupted, but checkout might take a while because it does a pop with a 5 seconds default timeout. So probably checkout should not be under handle_interrupt(Object => :never) in the first place. The tricky part is once the connection is returned by pop it needs to become uninterruptible, but there is no way to do that with the Thread.handle_interrupt API. So probably a smaller timeout is the best workaround here.
Also ConnectionPool::TimedStack has a bunch of code which doesn't seem to me like it could be replaced or implemented with a SizedStack (but I haven't tried).
It reminds me when I once tried to replace the queue in Puma's thread pool with a SizedQueue to remove the extra ConditionVariable & Mutex, but it's not possible, it does things that SizedQueue can't do.
IOW, I think we should have performance numbers here and implemented use cases that pass the relevant test suites, otherwise we might add core class(es) that actually can't be used for the intended use cases.
Updated by byroot (Jean Boussier) about 1 month ago
IOW, I think we should have performance numbers here and implemented use cases that pass the relevant test suites
Yes that's fair. I meant to re-implement the redis-client pool using Thread::Queue as a proof of concept, but didn't get around to it yet.
Updated by p8 (Petrik de Heus) about 1 month ago
· Edited
Eregon (Benoit Daloze) wrote in #note-9:
It reminds me when I once tried to replace the queue in Puma's thread pool with a SizedQueue to remove the extra ConditionVariable & Mutex, but it's not possible, it does things that SizedQueue can't do.
The ConditionVariable in Puma 7's thread pool is causing issues in JRuby: https://github.com/jruby/jruby/issues/9140
So I thought about attempting to rewrite it without the ConditionVariable (but that is way out of my expertise and now I'm learning not possible with a SizedQueue).
Sequel's timed_queue avoids the ConditionVariable with big improvements for some, but I'm not sure that can be applied to Puma.
Updated by mame (Yusuke Endoh) about 1 month ago
Discussed at the dev meeting. No conclusion has been reached, but some opinions raised included:
- Instead of introducing separate classes for each algorithm, how about introducing a generic container class under a more use-case-driven name like
Thread::Pool?Thread::Pool.new(:fifo)orThread::Pool.new(:lifo)could determine how to retrieve items, for example. (note: @matz (Yukihiro Matsumoto) didn't say his opinion about this, as I recall) - If we only need to add a method to retrieve items LIFO from Queue, returning to #21721, how about a method name that is clearly not orthodox, like
Queue#unpush? @matz (Yukihiro Matsumoto) said he can compromise if a better name could be found.
Updated by Eregon (Benoit Daloze) about 1 month ago
p8 (Petrik de Heus) wrote in #note-11:
The ConditionVariable in Puma 7's thread pool is causing issues in JRuby: https://github.com/jruby/jruby/issues/9140
I recall it was also causing issues in TruffleRuby back then hence my short attempt to remove it.
But we fixed ConditionVariable since then and it seems to work fine now (though indeed there is the overhead of creating a InterruptedException, but the whole Java blocking/interruption is basically based on those so seems hard to fix and more of a JVM bug IMO).
ConditionVariable are a very useful concurrency primitive, so I don't think it's productive in general to try to avoid them, better to optimize them since they are used in plenty of places.
So I thought about attempting to rewrite it without the ConditionVariable (but that is way out of my expertise and now I'm learning not possible with a SizedQueue).
Not possible is probably too strong, what I meant to say is when I looked at it and tried it there are some methods I couldn't implement with a SizedQueue, so the API back then and I think the API now has methods which are not implementable with a SizedQueue.