Feature #19075
closedBinary searching for the last element
Description
My latest proposal is https://bugs.ruby-lang.org/issues/19075#note-6.
I will leave the initial proposal below.
PR: https://github.com/ruby/ruby/pull/6611
(I'm going to talk about Array
here, but the same argument can be made for Range
. If Array#bsearch_last
is acceptable, I will work also for Range
.)
Ruby's bsearch returns the first element which satisfies the given block.
# Search the first element greater than 18
array = [10, 15, 20, 25]
array.bsearch { |x| x > 18 } # => 20
If we want the last element, we need to invert the condition and step backward.
# Search the last element less than 18
array = [10, 15, 20, 25]
index = array.bsearch_index { |x| !(x < 18) }
array[index-1] # => 15
Of course, we need to consider nil
and the boundary.
# Search the last element less than 100
index = array.bsearch_index { |x| !(x < 100) } # => nil
if index.nil?
array.last # => 25
else
array[index-1]
end
# Search the last element less than 0
index = array.bsearch_index { |x| !(x < 0) } # => 0
if index.nil?
array.last
elsif index == 0
nil
else
array[index-1]
end
This is where mistakes can easily be made, so I propose Array#bsearch_last
and Array#bsearch_last_index
.
Array#bsearch_last
returns the last element which satisfies the given block.
Array#bsearch
requires that all false-evaluating elements precede all true-evaluating elements. As is clear from the meaning of the method, conversely to bsearch
, bsearch_last
requires that all true-evaluating elements precede all false-evaluating elements. (If bsearch_last
is acceptable, the name "find-minimum mode" should be changed.)
array = [10, 15, 20, 25]
array.bsearch_last { |x| x < 18 } # => 15
array.bsearch_last { |x| x < 100 } # => 25
array.bsearch_last { |x| x < 0 } # => nil
There are several possible options for find-any mode.
(1) bsearch_last
does not support find-any mode.
A block for bsearch_last
must return true
, false
or nil
.
[1, 2, 3].bsearch_last { 0 } # => TypeError
My pull request tentatively includes this implementation.
(2) bsearch_last
supports find-any mode and it behaves like bsearch
.
bsearch
with find-any mode returns an element, for which the block returns zero.
If multiple elements satisfy the condition, it is not determined which of them will be returned.
It is conceivable that bsearch_last
behaves in the same way as bsearch
.
# current behavior
# It is not specified whether `:b`, `:c`, or `:d` is returned.
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch { |a, b| 2 <=> a } # => [2, :c]
(3) bsearch_last
supports find-any mode and returns the last element. Make bsearch
return the first element.
Change the behavior of bsearch
to return the first element for which the block returns zero.
bsearch_last
returns the last element for which the block returns zero.
# Change it like this:
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch { |a, b| 2 <=> a } # => [2, :b]
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch_last { |a, b| 2 <=> a } # => [2, :d]
(If this option is adopted, the name "find-any mode" should be renamed.)
Updated by nobu (Nobuyoshi Nakada) over 2 years ago
- Description updated (diff)
Looks reasonable.
An alternative could be to add an optional (keyword) argument to bsearch
and bsearch_index
.
Do you consider the separated methods will be better?
About find-any mode, the last option (3) sounds something better, IMHO.
Updated by kyanagi (Kouhei Yanagita) over 2 years ago
I was only thinking of adding a new method, but an optional argument for bsearch
also seems to be good.
The name of the argument would be as follows?
bsearch(target: :first)
bsearch(target: :last)
Updated by Dan0042 (Daniel DeLorme) about 2 years ago
nobu (Nobuyoshi Nakada) wrote in #note-1:
An alternative could be to add an optional (keyword) argument to
bsearch
andbsearch_index
.
That was also my first reaction on reading this proposal. A few other options:
bsearch(last: true)
bsearch(:last)
bsearch(:min) #default for find-minimum mode
bsearch(:any) #default for find-any mode
bsearch(:max)
Updated by zverok (Victor Shepelev) about 2 years ago
I am thinking that maybe reverse: true
can be a good generic option to then spread through other methods?
-
bsearch(reverse: true)
(binary search, reverse order) -
sort(reverse: true)
(sort, reverse order) -
find_index(reverse: true)
(find last index = find index, search in reverse order)
etc?..
Updated by kyanagi (Kouhei Yanagita) about 2 years ago
I have thought about this issue and am now leaning toward bsearch(target: :last)
.
I proposed bsearch_last
at first, this is because I didn't think of an optional keyword argument for bsearch
.
Since searching the beginning and the end are the same binary searching, it seems better to call them by the same name.
Let's think about the arguments.
Consider the possibility of further arguments, keyword arguments would be better than positional argunents.
What name should the keyword arguments be?
reverse: true
is attractive in that it allows the same name to be used for various methods,
but I don't think bsearch(reverse: true)
clearly expresses the intent of looking for the last element.
The general word "reverse" seems to be vague to indicate the intent of "whether the target is the first one or the last one".
I think bsearch(target: :first)
and bsearch(target: :last)
are the best names that convey the intent.
About find-any mode¶
If we adopt bsearch(target: :last)
, I think the option (3) is the best choice.
Updated by kyanagi (Kouhei Yanagita) over 1 year ago
I made another pull request, which is my latest proposal: https://github.com/ruby/ruby/pull/8339
This new PR implements the option (3):
- bsearch and bsearch_index take the argument `target', whose possible values are :first or :last.
- find-by-number mode (former find-any mode) returns the first (or last) element for which the block returns zero.
- find-any mode has been renamed to find-by-number mode.
- find-minimum mode has been renamed to find-by-boolean mode.
a = [2, 3, 5, 7, 11]
a.bsearch {|x| x >= 6 } # => 7
a.bsearch(target: :first) {|x| x >= 6 } # same as a.bsearch { ... }
a.bsearch(target: :last) {|x| x <= 6 } # => 5
a.bsearch(target: :first) {|x| 0 } # => 2
a.bsearch(target: :last) {|x| 0 } # => 11
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
The PR fails at spec/ruby/core/array/bsearch_index_spec.rb:67.
it "returns the middle element when block always returns zero" do
@array.bsearch_index { |x| 0 }.should == 2
end
I consider that the return value in this case was indeterministic, and this assertion is too implementation specific.
Updated by kyanagi (Kouhei Yanagita) over 1 year ago
I consider that the return value in this case was indeterministic, and this assertion is too implementation specific.
I agree. I think this assertion is not based on the specification.
Updated by kyanagi (Kouhei Yanagita) over 1 year ago
- Subject changed from Add Array#bsearch_last and #bsearch_last_index to Binary searching for the last element
- Description updated (diff)
Updated by Dan0042 (Daniel DeLorme) over 1 year ago
a = [2, 3, 5, 7, 11] a.bsearch {|x| x >= 6 } # => 7 a.bsearch(target: :first) {|x| x >= 6 } # same as a.bsearch { ... } a.bsearch(target: :last) {|x| x <= 6 } # => 5 a.bsearch(target: :first) {|x| 0 } # => 2 a.bsearch(target: :last) {|x| 0 } # => 11
I like this implementation.
But for naming, personally I find target: :last
to be confusing. If I suddenly read that in code I wouldn't be able to guess what it does; I would have to refer to the documentation. I don't really have a good alternative. Maybe bsearch(find: :last)
?_?
Updated by sawa (Tsuyoshi Sawada) over 1 year ago
Dan0042 wrote:
I find
target: :last
to be confusing. If I suddenly read that in code I wouldn't be able to guess what it does; I would have to refer to the documentation.
I agree. At least, bsearch(find: :last)
is better. May be bsearch(towards: :last)
may work as well.
However, when we want to reverse the direction in any sense (such as the direction of search), in my understanding, it's Ruby's custom to use the word "reverse" or a prefix "r":
Enumerable#revserse_each
Array#rindex
String#rindex
String#rjust
String#rpartition
String#rstrip
To be consistent, I think it is better to have independent methods as in the original proposal, but with method names like:
reverse_bsearch
bsearch_rindex
Updated by Dan0042 (Daniel DeLorme) over 1 year ago
sawa (Tsuyoshi Sawada) wrote in #note-11:
bsearch_rindex
I love this one, how it has a direct parallel in the way the condition must be inverted when using #index
vs #rindex
.
a = [5,6,6,6,7]
a.index{ |e| e >= 6 } #=>1
a.rindex{ |e| e <= 6 } #=>3 (inverted condition)
a.bsearch_index{ |e| e >= 6 } #=>1
a.bsearch_rindex{ |e| e <= 6 } #=>3 (inverted condition just like rindex and bsearch(target: :last) implementation above)
Updated by mame (Yusuke Endoh) over 1 year ago
I think the workaround is good enough. Do you have a good use case to introduce this?
The proposed behavior is not something that can simply be called "last".
The traditional behavior requires a given block to return false
for elements in the first half and true
for elements in the second half. On the other hand, this proposal requires it to return true
for the first half and false
for the second half.
The name bsearch_last
is never clear enough to me for this behavior. bsearch_rindex
and rbsearch
are not good either.
bsearch
is confusing, and this proposed mode is even more confusing. I don't think it is a good idea to make this mode selectable by a keyword argument.
Updated by kyanagi (Kouhei Yanagita) over 1 year ago
(In the following, searching for the last element will be referred to as "LAST".)
There are often cases where searching for the last element would be useful.
A typical example that immediately comes to mind is as follows:
- When we have a list of events with timestamps, find the last event that occurred up to a specific time
events.bsearch(target: :last) { |event| event.timestamp <= time }
Of course, this can be written without LAST, but I believe using LAST makes it more natural
and less prone to errors.
index = events.bsearch { |event| event.timestamp > time }
last_event = if index.nil?
events.last
elsif index == 0
nil
else
events[index-1]
end
A similar example is the case of finding the most expensive item available in a shop.
highest = item_prices.bsearch(target: :last) { |price| price <= your_money }
Let's see another example.
Given a sorted list of words, find the last word with a particular prefix in lexicographic order.
words = %w(pen pi pin pine ping png)
prefix = 'pin'
# => want to get 'ping'
With LAST and determistic find-any mode (what I call find-by-number mode in my pull request),
the solution to this problem can be written as follows.
words.bsearch(target: :last) { |w| prefix <=> w[0, prefix.size] }
Using the current bsearch_index
, it would be like this.
index = words.bsearch_index { |w| w[0, prefix.size] > prefix }
w = if index.nil?
words.last
elsif index == 0
nil
else
words[index-1]
end
ans = w if w[0, prefix.size] == prefix
The difference is obvious.
When it comes to names, I thought target: :last
was easy to understand.
However, the fact that several people have said that it is confusing means that this may not be an appropriate name.
I hope we can find a good name.
As mame-san says, bsearch
is a confusing method in the first place.
The condition that false must precede true in the value returned by the block, or that positive must precede zero and zero must precede negative, is difficult to imagine from the appearance of the method.
Rather than trying to interpret the meaning of behavior, a formal name which represents "the reverse of bsearch
" might be easier to understand.
Updated by kyanagi (Kouhei Yanagita) over 1 year ago
FYI: Julia has searchsortedfirst
and searchsortedlast
https://docs.julialang.org/en/v1/base/sort/#Base.Sort.searchsortedfirst
Updated by matz (Yukihiro Matsumoto) over 1 year ago
- Status changed from Open to Closed
We reject bsearch_last
, it's not intuitive (from my POV). If we come up with good name candidates, resubmit the proposal.
Matz.