Project

General

Profile

Feature #1089

Stable sorting for sort and sort_by

Added by murphy (Kornelius Kalnbach) over 10 years ago. Updated 6 months ago.

Status:
Rejected
Priority:
Normal
Target version:
-
[ruby-core:21737]

Description

=begin
I'd like to have stable sorting in Ruby. Stable sorting means that if two items are ==, they will retain their order in the list after sorting. In other words, stable sort does never switch two equal items.

In Ruby 1.9.1, you'd have to use something like

enum.sort_by.with_index { |a, i| [a, i] }

to achieve a stable sort in Ruby.

It would also be nice to minimize the number of comparisons, since method calling is expensive in Ruby.

Python is using a highly optimized implementation of a variant of merge sort, which is stable and uses very few comparisons (called Timsort after Tim Peters); maybe Ruby can adopt it for sort and sort_by:

http://svn.python.org/projects/python/trunk/Objects/listsort.txt

Introducing stable sorting would not be a problem for old code because the order of equal items after sorting is currently not specified.
=end

History

#1

Updated by headius (Charles Nutter) over 10 years ago

Kornelius Kalnbach wrote:

I'd like to have stable sorting in Ruby. Stable sorting means that if two items are ==, they will retain their order in the list after sorting. In other words, stable sort does never switch two equal items.
...
It would also be nice to minimize the number of comparisons, since method calling is expensive in Ruby.

Python is using a highly optimized implementation of a variant of merge sort, which is stable and uses very few comparisons (called Timsort after Tim Peters); maybe Ruby can adopt it for sort and sort_by:

http://svn.python.org/projects/python/trunk/Objects/listsort.txt

Introducing stable sorting would not be a problem for old code because the order of equal items after sorting is currently not specified.

I'd love to hear why you're interested in a stable sort. In JRuby, we
recently put in place an unstable hybrid of a couple sorts so we'd have
competitive performance. Our original sort was exactly what you talk
about for Python, a "pretty fast" stable modified merge sort.

I'd also be interested to see how the performance of Timsort compares to
what's currently in Ruby and JRuby as well as compared to the built-in
JDK sort we used before. If stable sorting is desired, we could
certainly provide a flag or flip a bit in JRuby to provide it.

#2

Updated by shyouhei (Shyouhei Urabe) over 10 years ago

  • Status changed from Open to Feedback
  • Assignee set to matz (Yukihiro Matsumoto)
#3

Updated by murphy (Kornelius Kalnbach) over 10 years ago

Hello,

it took me some time to answer. Your question wasn't easy ;)

Charles Oliver Nutter wrote:

I'd love to hear why you're interested in a stable sort.

Two use cases come to my mind where stable sort is needed. Excuse me for
the confusing examples, but I find it always hard to talk about this topic.

The first one is a bit pathetic. A list of different objects, some of
which are ==, but the list is already sorted. Unstable sort might change
the list order, which might surprise some people.

# Something that can be == without being the same.
class StrangeInt < Struct.new :value
  def <=> other
    self.value / 2 <=> other.value / 2
  end
  include Comparable
  def inspect
    value
  end
end
StrangeInt.new(4) == StrangeInt.new(5)  # => true

a = Array.new(10) { |i| StrangeInt.new i }
a  # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# stable sort would return this
a.sort_by { |x| [x.value / 2, x.value] }
# => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

a.sort
# => [0, 1, 3, 2, 4, 5, 6, 7, 8, 9]

a.sort_by { |x| x }
# => [0, 1, 3, 2, 4, 5, 6, 7, 8, 9]

(This also shows that currently, Array#sort does not check if the list
is already sorted.)

The second example is sort_by chaining. Imagine you have a table view of
files with name, file type, and file size. When you're searching for the
biggest PDF file, you first click twice on the "size" column and then
once on the "type" column, and scroll down to the .pdf files to find the
biggest one on the top. A Rails-style variant:

  files.sort_by(&:size).reverse.sort_by(&:type)

...which doesn't work as expected with unstable sort! You need to

  files.sort_by { |file| [file.type, -file.size] }

I think stable sort feels more natural here.

And the -file.size trick doesn't even work for strings! We had such a
case in a project, and in the end we couldn't implement it the way we
wanted. With a stable sort, our implementation would become much more
clean. I think the example is a very strong case for stable sorting.

In JRuby, we recently put in place an unstable hybrid of a couple
sorts so we'd have competitive performance. Our original sort was
exactly what you talk about for Python, a "pretty fast" stable
modified merge sort.

Oh noes _^ How much difference did it make?

I'd also be interested to see how the performance of Timsort compares
to what's currently in Ruby and JRuby as well as compared to the
built-in JDK sort we used before. If stable sorting is desired, we
could certainly provide a flag or flip a bit in JRuby to provide it.

I don't know how timsort would perform compared to the current MRI
quicksort or JRuby's sort implementation. I assume it takes some time to
port it to Ruby for testing, and my experience with Ruby core
programming is limited. But I think the question of a need for stable
sorting is independent from that.

[murphy]

#4

Updated by headius (Charles Nutter) over 10 years ago

Kornelius Kalnbach wrote:

Charles Oliver Nutter wrote:

I'd love to hear why you're interested in a stable sort.

Two use cases come to my mind where stable sort is needed. Excuse me for
the confusing examples, but I find it always hard to talk about this topic.

Your two cases make sense to me, and are similar to reasons I gave for
us not to abandon a stable sort. I eventually overruled myself, however,
after more published benchmark results put us slower than Ruby 1.8 for
array sorts. Perception is important; more important than sort
stability, I have not decided :)

In JRuby, we recently put in place an unstable hybrid of a couple
sorts so we'd have competitive performance. Our original sort was
exactly what you talk about for Python, a "pretty fast" stable
modified merge sort.

Oh noes _^ How much difference did it make?

I guess we ended up going with an unstable sort because it did make a
difference; I think keeping the sort stable makes it harder to combine
allgorithms for different classes of list (partially sorted, reversed,
randomized) which seemed to be key to matching MRI's sort performance. I
raised a concern about stability, but we figured if nobody had seen
issues in MRI nobody would care if JRuby's sort became unstable.

Perhaps this is a case where having the fast unstable sort support a
slower, possibly ruby-based, stable sort is the way to go. In general a
ruby-based sort performs pretty well on both JRuby and Ruby 1.9. Here's
some recent results:

http://blog.concord.org/archives/24-Comparison-of-Ruby-1.8.6-1.9-and-JRuby-running-on-Java-1.5-1.6-and-1.7.html

It certainly wouldn't be as fast as a native sort, but perhaps that's
not a big deal?

  • Charlie
#5

Updated by murphy (Kornelius Kalnbach) over 10 years ago

Charles Oliver Nutter wrote:

Your two cases make sense to me, and are similar to reasons I gave
for us not to abandon a stable sort. I eventually overruled myself,
however, after more published benchmark results put us slower than
Ruby 1.8 for array sorts. Perception is important; more important
than sort stability, I have not decided :)

actually, I'd prefer the feature of "stable sort" over "fast sort". I
use Ruby for convenience, not for speed. still: what is the tradeoff?

I guess we ended up going with an unstable sort because it did make a
difference; I think keeping the sort stable makes it harder to
combine allgorithms for different classes of list (partially sorted,
reversed, randomized) which seemed to be key to matching MRI's sort
performance. I raised a concern about stability, but we figured if
nobody had seen issues in MRI nobody would care if JRuby's sort
became unstable.

it's critical how much the difference was...if we get stability for,
say, 15% more sort time, I'd buy it. if even the best alogrithms are 50x
slower, I'd abandon my request. but I don't think this is the case; more
investigation follows.

anything that performs better than this would be appreciated (stable
sort is roughly 50 times slower):

time ruby19 -e 'Array(1..1000000).shuffle.sort'
real    0m0.529s
real    0m0.164s (without sorting)

i = 0; Array(1..1000000).shuffle.sort_by { |o| [o, i += 1] }
real    0m18.742s

Array(1..1000000).shuffle.sort_by.with_index { |o, i| [o, i] }
real    0m18.430s

Array(1..1000000).shuffle.sort_by.with_index { |*a| a }
real    0m18.383s

Perhaps this is a case where having the fast unstable sort support a
slower, possibly ruby-based, stable sort is the way to go. In general
a ruby-based sort performs pretty well on both JRuby and Ruby 1.9.
Here's some recent results:

http://rubyurl.com/3d87

It certainly wouldn't be as fast as a native sort, but perhaps that's
not a big deal?

it is...using their benchmark mergesort:

-rmergesort -e 'mergesort Array(1..1000000).shuffle'
real    0m18.066s

wow, that's faster than the sort_by approach :D my own in-place merge
sort version (based on the one Java uses internally for Object[]) is
even faster:

-rmerge_sort -e 'Array(1..1000000).shuffle.merge_sort!'
real    0m10.401s

so, the best I currently get with pure Ruby is about 30x. I'm sure we
can do better with a Java or C implementation.

let's see timsort now. some Python 2.5 benchmarks:

time python2.5 -c 'from random import shuffle; list = range(1000000);
shuffle(list); sorted(list)'
real    0m2.432s
real    0m1.191s (without sorting)

so I guess timsort takes about 1.3s, which would be about 3.5x slower
than Ruby's sort. this is not very nice; but 1.3s are still much better
than 10s!

so, maybe stable sort should be optional. what options do we have?

  • compiler switch, global variable and such; we don't want that

  • a new method, like Enumerable#stable_sort. my code example would
    become:

files.stable_sort_by(&:size).reverse.stable_sort_by(&:type)

  • an optional parameter to sort and sort_by:

    enum.sort(stable=false)                   -> array
    enum.sort(stable=false) {| a,b | block }  -> array
    enum.sort_by(stable=false) {| obj | block }  -> array
    
    array.sort!(stable=false)                   -> array
    array.sort!(stable=false) {| a,b | block }  -> array
    array.sort_by!(stable=false) {| obj | block }  -> array
    

    I know boolean parameters are not nice, but we already use them in
    Object#methods. my code example would become:

    files.sort_by(true, &:size).reverse.sort_by(true, &:type)
    

none of these options feels like Ruby to me. I hope somebody has another
idea.

[murphy]

#6

Updated by murphy (Kornelius Kalnbach) over 10 years ago

Charles Oliver Nutter wrote:

In JRuby, we recently put in place an unstable hybrid of a couple
sorts so we'd have competitive performance.

can you remember which version if JRuby still contains the stable sort?
I'd like to do benchmarks...

[murphy]

#7

Updated by marcandre (Marc-Andre Lafortune) about 10 years ago

  • Status changed from Feedback to Open
#8

Updated by naruse (Yui NARUSE) about 10 years ago

  • Status changed from Open to Rejected
#9

Updated by murphy (Kornelius Kalnbach) almost 10 years ago

I'm not sure why the proposal was rejected. Can somebody explain?

If we don't want to change the #sort implementation for performance reasons, I can open another ticket for the request of an alternative, fast and stable sort method in core Ruby. I still think that Ruby should have a core method for stable sorting.

#10

Updated by matz (Yukihiro Matsumoto) almost 10 years ago

Hi,

In message "Re: [ruby-core:26557] [Feature #1089] Stable sorting for sort and sort_by"
on Fri, 6 Nov 2009 12:00:35 +0900, Kornelius Kalnbach redmine@ruby-lang.org writes:

I'm not sure why the proposal was rejected. Can somebody explain?

As far as I know, sort stability is rarely required, not worthy for
slows down sorting in general. If there's stable sort routine faster
than current one, please tell me.

If we don't want to change the #sort implementation for performance reasons, I can open another ticket for the request of an alternative, fast and stable sort method in core Ruby. I still think that Ruby should have a core method for stable sorting.

If you really need stable sort, you can do so using sort_by pretty
easily.

i = 0
ary.sort_by{|x| [x, i += 1] }
                        matz.
#11

Updated by naruse (Yui NARUSE) almost 10 years ago

Because you can't persuade Charles (and matz).

I maintain tickets which doesn't concluded for long time.
Rejecting tickets kills or reactivates it.

You can open another ticket for an alternative or
reopen this with new implementation or use cases which strongly needs stable sorting.
Seeing above matz' comment, you should show use cases which needs fast and stable sort.

Updated by jonathanhefner (Jonathan Hefner) 6 months ago

  • Description updated (diff)

One use case I have for a stable sort is a large CSV file which is tracked in a git repository. I have a script which loads the file, adds rows, and then sorts all rows by a particular column before writing them back to the file. I need to minimize the git diff for the file, so I use a stable sort. I currently use the sort_by.with_index technique, but it would be nice to have faster performance. (A benchmark currently shows sort_by.with_index is 7x slower than just sort_by.)

Another use case I have is a list of tasks with optional priority. By default, the tasks are ordered manually with no priority set. But when a priority is set, the tasks are sorted such that higher priority tasks come first, while still preserving the manual ordering of both the prioritized and unprioritized tasks.

In addition to performance benefits, I think something like sort_by(stable: true, &:foo) communicates purpose more clearly than sort_by.with_index{|x, i| [x.foo, i] }.

Also available in: Atom PDF