Feature #8478

The hash returned by Enumerable#group_by should have an empty array for its default value

Added by Pete Higgins 11 months ago. Updated 11 months ago.

[ruby-core:55260]
Status:Open
Priority:Normal
Assignee:Yukihiro Matsumoto
Category:-
Target version:-

Description

Without this patch, nil checks might need to be done on the return value of Enumerable#group_by:

$ cat testgroupby.rb
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"

$ ruby testgroupby.rb
Fixnums: 3
Strings: 2
testgroupby.rb:6:in <main>': undefined methodsize' for nil:NilClass (NoMethodError)

This patch adds a default value of an empty array to the hash returned by Enumerable#group_by, so the script above will work:

$ ./ruby -I.:lib testgroupby.rb
Fixnums: 3
Strings: 2
Arrays: 0

group_by_default_empty_array.diff Magnifier (1.81 KB) Pete Higgins, 06/03/2013 04:37 AM

History

#1 Updated by Marc-Andre Lafortune 11 months ago

I understand the idea, but there are problems with this.

First of, it's a really bad idea to set the default of a hash to a mutable object:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g[Array] << [:foo]
g[Hash] # => [[:foo]], you expected []

Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.

#2 Updated by Boris Stitnicky 11 months ago

@marcandre: What do you mean by incompatibility? That the behavior is not the same as before, or something deeper?

#3 Updated by Marc-Andre Lafortune 11 months ago

  • Assignee set to Yukihiro Matsumoto

Right, that the behavior is changing and that this could break existing code. It might have been an interesting idea at the time group_by was introduced, but I doubt that Matz will feel the change is compelling enough to break compatibility today.

#4 Updated by Martin Dürst 11 months ago

I don't think I agree with the proposer. The example looks good, but what about something like:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"

In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).

That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.

#5 Updated by Yura Sokolov 11 months ago

There are always different ways to work with hash:

counters = Hash.new{|h,k| 0}
list.each{|e| counters[e]+=1}
some_thing_count = counters[some_thing]

compared to

counters = {}
list.each{|e| counters[e] = (counters[e]||0) + 1}
raise "THERE IS NO SOME-THING" unless counters[some_thing]

So that, group_by should not assume user's needs

So, one could assign default proc if he really wants:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g.default_proc = proc{|hash, key| hash[key] = [] if key.class==Class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"
if g[0]
  puts "Zeroes: #{g[0].size}"
else
  raise "NO ZEROES!!!!"
end

#6 Updated by Pete Higgins 11 months ago

marcandre (Marc-Andre Lafortune) wrote:

First of, it's a really bad idea to set the default of a hash to a mutable object:

Yes, I didn't consider the problem of returning the same array for every key, and that would have to be changed if this were to be seriously considered.

Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.

It seems like some appropriate milestone like 2.1 might be an acceptable place to make such an incompatible change.

#7 Updated by Pete Higgins 11 months ago

duerst (Martin Dürst) wrote:

I don't think I agree with the proposer. The example looks good, but what about something like:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"

In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).

That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.

I certainly didn't intend for the default value to have some idea of the type of comparison done by the call to groupby, that seems to go well outside the scope of groupby and just general ruby semantics. It seems counter intuitive to me that while the point of the method is to return a hash of arrays, the return value would produce either an array with some values in it or nil. It returning a hash that produced arrays with some values in them or an empty arrays means you're only dealing with one type of thing, an array.

Also available in: Atom PDF