Bug #20226
closedInconsistent Sort results on 3.3.0 compared to previous versions
Description
Try this code block:
[-0.9, -0.88, -0.85, -0.83, -0.81, -0.79, -0.77, -0.75, -0.73, -0.71, -0.69, -0.67, -0.65, -0.63, -0.6, -0.58, -0.56, -0.54, -0.52,
-0.5, -0.48, -0.46, -0.44, -0.42, -0.4, -0.38, -0.35, -0.33, -0.31, -0.29, -0.27, -0.25, -0.23, -0.21, -0.19, -0.17, -0.15, -0.13,
-0.1, -0.08, -0.06, -0.04, -0.02, 0.0, 0.02, 0.0, 0.02].sort_by(&:abs)
The end result should be the numbers absolute sorted, look at the last 5 numbers of this, the end result of them should be [0.0, 0.0, -0.02, 0.02, 0.02...]
maintaining the original order, and this behavior is what we see on ruby 3.2.0, however on ruby 3.3.0 the end result will be [0.0, 0.0, 0.02, 0.02, -0.02...]
This is also inconsistent, as [-0.02, 0.0, 0.02, 0.0, 0.02].sort_by(&:abs)
will actually provide the expected result.
Again, the main issue for us is the difference between 3.3.0 and previous versions of ruby.
Updated by nobu (Nobuyoshi Nakada) 11 months ago
sort
family methods do not guarantee stable sort.
That means the result order of the same value elements is not affected by the original order of those elements.
Updated by bkDJ (Djilani Kebaili) 11 months ago
You can force stability by using the original index as a tiebreaker.
arr = [*Array.new(15,1), -0.02, 0.02]
puts "Unstable: #{arr.sort_by(&:abs)[0..1]}"
puts "Stable: #{arr.sort_by.with_index { [_1.abs, _2] }[0..1]}"
Updated by Eregon (Benoit Daloze) 11 months ago
When was sort
/sort_by
changed to no longer be stable? I don't remind any ticket about it or NEWS entry.
IMO it's a pretty big change to make it non-stable when it has been stable for so long.
Especially for a high-level language like Ruby, where I'd think most users do not expect such low-level & error-prone issues to arise.
Updated by ufuk (Ufuk Kayserilioglu) 11 months ago
Sorting in Ruby never was stable. There were discussions to make it stable, but Matz was concerned about the speed impact. I don't think this particular case is a regression, the behaviour of sort depends on the number of elements, and other factors since different sorting algorithms could be used for different sized lists.
Updated by ufuk (Ufuk Kayserilioglu) 11 months ago
Here is the previous discussion from 15 years ago: https://bugs.ruby-lang.org/issues/1089
It might be about time to revisit the performance impact of a stable sort, imo, but that's not the discussion on this ticket.
Updated by Eregon (Benoit Daloze) 11 months ago
Sorting in Ruby never was stable
My mistake, like Roger Pack in https://github.com/crystal-lang/crystal/issues/2350#issuecomment-550188746, my experience is sort looks stable (on Linux).
And sort is stable on TruffleRuby since it uses mergesort (currently at least).
So what changed between 3.2 and 3.3?
I can reproduce the OP's behavior on linux-amd64.
Maybe it's sort_by
-specific? And maybe sort_by
was stable until 3.2?
Updated by ufuk (Ufuk Kayserilioglu) 11 months ago
Enumerable#sort_by
was also documented to not be stable in 3.2 (and previous versions): https://ruby-doc.org/3.2.3/Enumerable.html#method-i-sort_by
What might have changed (this is a layman's guess since I don't know the internals of Ruby sorting that well) is which sort routine Ruby uses from the OS/libc by default, which might change the behaviour of the resulting sort. Since Ruby provides no guarantees of stable sorting, this is not a regression, despite what specific platform sort routines might provide.
Updated by johnnyshields (Johnny Shields) 11 months ago
Whether or not Ruby's documentation made any explicit guarantees, I was under the strong impression (as per the Ruby behavior I've observed for 10 years) that Ruby did have a stable sort. Other Array
operations such as -
(minus), uniq
, etc. are stable. I have a 500k LOC app, it's going to be difficult to review all my code for potential impacts.
If Ruby's sort
was previously stable, it might be time to make that behavior official.
Updated by ufuk (Ufuk Kayserilioglu) 11 months ago
Regardless of any individual impressions, Ruby never made any stability guarantees for sorting. It was always explicit that sorting MAY BE unstable. This is because Ruby delegates the actual sorting to the efficient sorting routines provided by the platform in most cases, and those routines are free to handle sorting with whichever algorithm would be the most efficient for the input. This means that a sort that is always stable on macOS could show unstable results on Linux, for example, or vice versa.
So, regardless of what you are observing, Ruby makes no guarantees about the stability of sorting and that has always been the case.
I've linked to a conversation from 15 years ago about making the sort stable, which was rejected at the time. If you would like to propose a path forward for stable sorting guarantees in Ruby, please open a Feature ticket with the proposed implementation and performance implications.
Updated by johnnyshields (Johnny Shields) 11 months ago
If sort was stable for a long time (it does seem like it was, but I am not 100% sure), then despite Ruby's disclaimer, many developers will have come to rely on and expect that behavior, and consider this as "a behavior change that breaks their apps"--whether or not we strictly term it as a "regression bug".
Updated by Eregon (Benoit Daloze) 11 months ago
many developers will have come to rely on and expect that behavior,
I think that's a fair point, similar to e.g Set insertion ordering, even though that's not documented I'm pretty sure some Ruby code relies on it because it holds since a very long time.
The key seems to be whether sort was in practice stable on some platform/OS (I don't know for sure).
It'd be interesting to find out what exactly changed for 3.3.0.
Updated by alanwu (Alan Wu) 11 months ago
Seems like maybe it was changed by #19643 for sort_by
. But in any case, it was never stable since it was using the platform's unstable qsort_r
. On macOS, I get [0.0, 0.0, 0.02, 0.02, -0.02, ...]
even on 3.2.2. Glibc has warned about qsort being unstable for a long time, too.
Updated by alanwu (Alan Wu) 11 months ago
The key seems to be whether sort was in practice stable on some platform/OS
Ah, until recently, glibc's qsort
was actually a stable merge sort most of the time, which is why Ruby's sort felt stable on Linux.
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=03bf8357e8291857a435afcc3048e0b697b6cc04
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/GBTHE3V6AYHCMBI5BI2BVTCAYOA4HFFX/
Updated by jeremyevans0 (Jeremy Evans) 10 months ago
- Status changed from Open to Closed