Project

General

Profile

Actions

Feature #19075

closed

Binary searching for the last element

Added by kyanagi (Kouhei Yanagita) over 1 year ago. Updated 6 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:110473]

Description

My latest proposal is https://bugs.ruby-lang.org/issues/19075#note-6.
I will leave the initial proposal below.


PR: https://github.com/ruby/ruby/pull/6611

(I'm going to talk about Array here, but the same argument can be made for Range. If Array#bsearch_last is acceptable, I will work also for Range.)

Ruby's bsearch returns the first element which satisfies the given block.

# Search the first element greater than 18
array = [10, 15, 20, 25]
array.bsearch { |x| x > 18 } # => 20

If we want the last element, we need to invert the condition and step backward.

# Search the last element less than 18
array = [10, 15, 20, 25]
index = array.bsearch_index { |x| !(x < 18) }
array[index-1] # => 15

Of course, we need to consider nil and the boundary.

# Search the last element less than 100
index = array.bsearch_index { |x| !(x < 100) } # => nil
if index.nil?
  array.last # => 25
else
  array[index-1]
end
# Search the last element less than 0
index = array.bsearch_index { |x| !(x < 0) } # => 0
if index.nil?
  array.last
elsif index == 0
  nil
else
  array[index-1]
end

This is where mistakes can easily be made, so I propose Array#bsearch_last and Array#bsearch_last_index.

Array#bsearch_last returns the last element which satisfies the given block.

Array#bsearch requires that all false-evaluating elements precede all true-evaluating elements. As is clear from the meaning of the method, conversely to bsearch, bsearch_last requires that all true-evaluating elements precede all false-evaluating elements. (If bsearch_last is acceptable, the name "find-minimum mode" should be changed.)

array = [10, 15, 20, 25]
array.bsearch_last { |x| x < 18 }  # => 15
array.bsearch_last { |x| x < 100 } # => 25
array.bsearch_last { |x| x < 0 }   # => nil

There are several possible options for find-any mode.

(1) bsearch_last does not support find-any mode.

A block for bsearch_last must return true, false or nil.

[1, 2, 3].bsearch_last { 0 } # => TypeError

My pull request tentatively includes this implementation.

(2) bsearch_last supports find-any mode and it behaves like bsearch.

bsearch with find-any mode returns an element, for which the block returns zero.
If multiple elements satisfy the condition, it is not determined which of them will be returned.

It is conceivable that bsearch_last behaves in the same way as bsearch.

# current behavior
# It is not specified whether `:b`, `:c`, or `:d` is returned.
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch { |a, b| 2 <=> a } # => [2, :c]

(3) bsearch_last supports find-any mode and returns the last element. Make bsearch return the first element.

Change the behavior of bsearch to return the first element for which the block returns zero.
bsearch_last returns the last element for which the block returns zero.

# Change it like this:
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch { |a, b| 2 <=> a } # => [2, :b]
[[1,:a], [2, :b], [2, :c], [2, :d], [3, :e]].bsearch_last { |a, b| 2 <=> a } # => [2, :d]

(If this option is adopted, the name "find-any mode" should be renamed.)

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Description updated (diff)

Looks reasonable.

An alternative could be to add an optional (keyword) argument to bsearch and bsearch_index.
Do you consider the separated methods will be better?

About find-any mode, the last option (3) sounds something better, IMHO.

Updated by kyanagi (Kouhei Yanagita) over 1 year ago

I was only thinking of adding a new method, but an optional argument for bsearch also seems to be good.
The name of the argument would be as follows?

bsearch(target: :first)
bsearch(target: :last)

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

nobu (Nobuyoshi Nakada) wrote in #note-1:

An alternative could be to add an optional (keyword) argument to bsearch and bsearch_index.

That was also my first reaction on reading this proposal. A few other options:

bsearch(last: true)
bsearch(:last)
bsearch(:min) #default for find-minimum mode
bsearch(:any) #default for find-any mode
bsearch(:max)

Updated by zverok (Victor Shepelev) over 1 year ago

I am thinking that maybe reverse: true can be a good generic option to then spread through other methods?

  • bsearch(reverse: true) (binary search, reverse order)
  • sort(reverse: true) (sort, reverse order)
  • find_index(reverse: true) (find last index = find index, search in reverse order)

etc?..

Updated by kyanagi (Kouhei Yanagita) about 1 year ago

I have thought about this issue and am now leaning toward bsearch(target: :last).

I proposed bsearch_last at first, this is because I didn't think of an optional keyword argument for bsearch.
Since searching the beginning and the end are the same binary searching, it seems better to call them by the same name.

Let's think about the arguments.

Consider the possibility of further arguments, keyword arguments would be better than positional argunents.

What name should the keyword arguments be?

reverse: true is attractive in that it allows the same name to be used for various methods,
but I don't think bsearch(reverse: true) clearly expresses the intent of looking for the last element.
The general word "reverse" seems to be vague to indicate the intent of "whether the target is the first one or the last one".

I think bsearch(target: :first) and bsearch(target: :last) are the best names that convey the intent.

About find-any mode

If we adopt bsearch(target: :last), I think the option (3) is the best choice.

Updated by kyanagi (Kouhei Yanagita) 7 months ago

I made another pull request, which is my latest proposal: https://github.com/ruby/ruby/pull/8339
This new PR implements the option (3):

  • bsearch and bsearch_index take the argument `target', whose possible values are :first or :last.
  • find-by-number mode (former find-any mode) returns the first (or last) element for which the block returns zero.
  • find-any mode has been renamed to find-by-number mode.
  • find-minimum mode has been renamed to find-by-boolean mode.
a = [2, 3, 5, 7, 11]
a.bsearch {|x| x >= 6 } # => 7
a.bsearch(target: :first) {|x| x >= 6 } # same as a.bsearch { ... }
a.bsearch(target: :last) {|x| x <= 6 } # => 5
a.bsearch(target: :first) {|x| 0 } # => 2
a.bsearch(target: :last) {|x| 0 } # => 11

Updated by nobu (Nobuyoshi Nakada) 7 months ago

The PR fails at spec/ruby/core/array/bsearch_index_spec.rb:67.

    it "returns the middle element when block always returns zero" do
      @array.bsearch_index { |x| 0 }.should == 2
    end

I consider that the return value in this case was indeterministic, and this assertion is too implementation specific.

Updated by kyanagi (Kouhei Yanagita) 7 months ago

I consider that the return value in this case was indeterministic, and this assertion is too implementation specific.

I agree. I think this assertion is not based on the specification.

Actions #9

Updated by kyanagi (Kouhei Yanagita) 7 months ago

  • Subject changed from Add Array#bsearch_last and #bsearch_last_index to Binary searching for the last element
  • Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) 7 months ago

a = [2, 3, 5, 7, 11]
a.bsearch {|x| x >= 6 } # => 7
a.bsearch(target: :first) {|x| x >= 6 } # same as a.bsearch { ... }
a.bsearch(target: :last) {|x| x <= 6 } # => 5
a.bsearch(target: :first) {|x| 0 } # => 2
a.bsearch(target: :last) {|x| 0 } # => 11

I like this implementation.
But for naming, personally I find target: :last to be confusing. If I suddenly read that in code I wouldn't be able to guess what it does; I would have to refer to the documentation. I don't really have a good alternative. Maybe bsearch(find: :last) ?_?

Updated by sawa (Tsuyoshi Sawada) 7 months ago

Dan0042 wrote:

I find target: :last to be confusing. If I suddenly read that in code I wouldn't be able to guess what it does; I would have to refer to the documentation.

I agree. At least, bsearch(find: :last) is better. May be bsearch(towards: :last) may work as well.

However, when we want to reverse the direction in any sense (such as the direction of search), in my understanding, it's Ruby's custom to use the word "reverse" or a prefix "r":

Enumerable#revserse_each
Array#rindex
String#rindex
String#rjust
String#rpartition
String#rstrip

To be consistent, I think it is better to have independent methods as in the original proposal, but with method names like:

reverse_bsearch
bsearch_rindex

Updated by Dan0042 (Daniel DeLorme) 7 months ago

sawa (Tsuyoshi Sawada) wrote in #note-11:

bsearch_rindex

I love this one, how it has a direct parallel in the way the condition must be inverted when using #index vs #rindex.

a = [5,6,6,6,7]
a.index{ |e| e >= 6 }          #=>1
a.rindex{ |e| e <= 6 }         #=>3 (inverted condition)
a.bsearch_index{ |e| e >= 6 }  #=>1
a.bsearch_rindex{ |e| e <= 6 } #=>3 (inverted condition just like rindex and bsearch(target: :last) implementation above)

Updated by mame (Yusuke Endoh) 6 months ago

I think the workaround is good enough. Do you have a good use case to introduce this?

The proposed behavior is not something that can simply be called "last".
The traditional behavior requires a given block to return false for elements in the first half and true for elements in the second half. On the other hand, this proposal requires it to return true for the first half and false for the second half.
The name bsearch_last is never clear enough to me for this behavior. bsearch_rindex and rbsearch are not good either.
bsearch is confusing, and this proposed mode is even more confusing. I don't think it is a good idea to make this mode selectable by a keyword argument.

Updated by kyanagi (Kouhei Yanagita) 6 months ago

(In the following, searching for the last element will be referred to as "LAST".)

There are often cases where searching for the last element would be useful.
A typical example that immediately comes to mind is as follows:

  • When we have a list of events with timestamps, find the last event that occurred up to a specific time
events.bsearch(target: :last) { |event| event.timestamp <= time }

Of course, this can be written without LAST, but I believe using LAST makes it more natural
and less prone to errors.

index = events.bsearch { |event| event.timestamp > time }
last_event = if index.nil?
               events.last
             elsif index == 0
               nil
             else
               events[index-1]
             end

A similar example is the case of finding the most expensive item available in a shop.

highest = item_prices.bsearch(target: :last) { |price| price <= your_money }

Let's see another example.
Given a sorted list of words, find the last word with a particular prefix in lexicographic order.

words = %w(pen pi pin pine ping png)
prefix = 'pin'
# => want to get 'ping'

With LAST and determistic find-any mode (what I call find-by-number mode in my pull request),
the solution to this problem can be written as follows.

words.bsearch(target: :last) { |w| prefix <=> w[0, prefix.size] }

Using the current bsearch_index, it would be like this.

index = words.bsearch_index { |w| w[0, prefix.size] > prefix }
w = if index.nil?
      words.last
    elsif index == 0
      nil
    else
      words[index-1]
    end
ans = w if w[0, prefix.size] == prefix

The difference is obvious.

When it comes to names, I thought target: :last was easy to understand.
However, the fact that several people have said that it is confusing means that this may not be an appropriate name.
I hope we can find a good name.

As mame-san says, bsearch is a confusing method in the first place.
The condition that false must precede true in the value returned by the block, or that positive must precede zero and zero must precede negative, is difficult to imagine from the appearance of the method.
Rather than trying to interpret the meaning of behavior, a formal name which represents "the reverse of bsearch" might be easier to understand.

Updated by matz (Yukihiro Matsumoto) 6 months ago

  • Status changed from Open to Closed

We reject bsearch_last, it's not intuitive (from my POV). If we come up with good name candidates, resubmit the proposal.

Matz.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0