Feature #9108

Hash sub-selections

Added by Tom Wardrop 5 months ago. Updated 5 months ago.

[ruby-core:58324]
Status:Open
Priority:Normal
Assignee:-
Category:-
Target version:-

Description

=begin
Hi,

I seem to regularly have the requirement to work on a sub-set of key/value pairs within a hash. Ruby doesn't seem to provide a concise means of selecting a sub-set of keys from a hash. To give an example of what I mean, including how I currently achieve this:

sounds = {dog: 'woof', cat: 'meow', mouse: 'squeak', horse: 'nay', cow: 'moo'}
domestic_sounds = sounds.select { |k,v| [:dog, :cat].include? k } #=> {dog: 'woof', cat: 'meow'}

I think a more concise and graceful solution to this would be to allow the Hash#[] method to take multiple arguments, returning a sub-hash, e.g.

domestic_sounds = sounds[:dog, :cat] #=> {dog: 'woof', cat: 'meow'}

I had a requirement in the current project I'm working on to concatenate two values in a hash. If this proposed feature existed, I could of just done this...

sounds[:dog, :cat].values.join #=> 'woofmeow'

You could do something similar for the setter also...

sounds[:monkey, :bat] = 'screech'
sounds #=> {dog: 'woof', cat: 'meow', mouse: 'squeak', horse: 'nay', cow: 'moo', monkey: 'screech', bat: 'screech'}

Concise, convenient and readable. Thoughts?

=end

History

#1 Updated by Matthew Kerwin 5 months ago

=begin
In your proposal, what would happen with undefined keys? I see two reasonable options:

sounds = {dog: 'woof', cat: 'meow'}
# option 1:
sounds[:dog, :fish] #=> {dog: 'woof'}
# option 2:
sounds[:dog, :fish] #=> {dog: 'woof', fish: nil}

Of the two, I'd much prefer the first. A third option is to raise an exception, but that seems the least friendly of all.

If approved, I'd be +1 on the multiple-setter as well. I've had scenarios in which I'd have used it had it been available.

We should note the previous feature discussions (can't remember issue numbers) involving nested lookups, which also suggested a multiple-argument #[] semantic.
=end

#2 Updated by Matthew Kerwin 5 months ago

=begin
Apologies for immediately replying again, but I've just had a potential source of confusion occur to me:

hash = {a:1, b:2}

keys = [:a,:b]
hash[*keys] #=> {a:1, b:2}

keys = [:a]
hash[*keys] #=> 1, expected {a:1} ?

keys = []
hash[*keys] #=> ???

I'm not against the subset feature, but I think using #[] will cause more trouble than it's worth. Why not use #subset or a similar name?
=end

#3 Updated by Nobuyoshi Nakada 5 months ago

=begin
--- wardrop (Tom Wardrop) wrote:
I think a more concise and graceful solution to this would be to allow the Hash#[] method to take multiple arguments, returning a sub-hash, e.g.

    domestic_sounds = sounds[:dog, :cat] #=> {dog: 'woof', cat: 'meow'}

As (({sounds[:dog]})) returns (({'woof'})), it should return the values only, even if it were introduced.

--- --->
I had a requirement in the current project I'm working on to concatenate two values in a hash. If this proposed feature existed, I could of just done this...

    sounds[:dog, :cat].values.join #=> 'woofmeow'

Try:
sounds.values_at(:dog, :cat).join('')

--- --->
You could do something similar for the setter also...

    sounds[:monkey, :bat] = 'screech'
    sounds #=> {dog: 'woof', cat: 'meow', mouse: 'squeak', horse: 'nay', cow: 'moo', monkey: 'screech', bat: 'screech'}

It feels ambiguous, since it looks like a kind of mulitple assignment to me.

Rather it should be:

sounds[:monkey, :bat] = 'screech'
# sounds[:monkey] == 'screech'
# sounds[:bat] == nil

sounds[:cock, :hen] = 'cock-a-doodle-doo', 'cluck'
# sounds[:cock] == 'cock-a-doodle-doo'
# sounds[:hen] == 'cluck'

shouldn't it?
=end

#4 Updated by Alexey Muranov 5 months ago

=begin
I think, in ((Rails)), the proposed method (not the assignment) is called ((<(({Hash#slice}))|URL:http://api.rubyonrails.org/classes/Hash.html#method-i-slice>)).

I think it is impossible to use (({#[]})) for that method:

h = {1 => 2}
h[1] # => {1 => 2}?
# => [2]?
# => Set[2]?
# => 2?
=end

#5 Updated by Rodrigo Rosenfeld Rosas 5 months ago

related to #8499 and the rejected #6847.

Personally I'd love to see Hash#slice implemented in Ruby core, but I think #[] shouldn't work as #slice. It should instead return the values only for the requested keys.

#6 Updated by Bertram Scharpf 5 months ago

=begin
(({
[:dog, :cat].map { |k| sounds[ k] }
#=> ["woof", "meow"]
}))

(({
[:dog, :cat].inject Hash.new do |r,k| r[ k] = sounds[ k] ; r end
#=> {:dog=>"woof", :cat=>"meow"}
}))

If you look at it this way, one should rather define an Array method than
a Hash method.

=end

#7 Updated by Ilya Vorontsov 5 months ago

I prefer such syntax for nested hash (as in http://bugs.ruby-lang.org/issues/5531)
Why not to use usual method like #keyvaluesat or #subhash or smth like that?

#8 Updated by Tom Wardrop 5 months ago

=begin
I suppose square-bracket syntax is too ambiguous as it collides with many other existing and potential behaviours I didn't consider. A normal method is fine.

I think an appropriate solution would be to amend Hash#select. It currently doesn't take any arguments, so could easily take a list of (({*args})). You could use the optional block to further refine this selection if need be. The same for Hash#reject.

One potential issue though is that Hash#select without any arguments currently returns an enumerator. This could be problematic when doing something like (({hash.select args})) with an empty array. I think calling Hash#select without arguments should return a copy of the full original Hash, this keeps it somewhat compatible with the existing behaviour of returning an enumerator (given a Hash is an enumerator), while at the same time making it consistent with the new Hash#select(keys) functionality.

What do you think of changing#select and #reject to support this?
=end

#9 Updated by Nobuyoshi Nakada 5 months ago

Enumerator differs from Hash.

#10 Updated by Tom Wardrop 5 months ago

They do differ, yes, but in most cases an enumerator is interchangeable with a Hash. I can't imagine anyone would be using Hash#select to get an enumerator anyway. If anyone is, then their code deserves to break to some extent. You should use Hash#enum_for or Hash#each methods if you want an enumerator from a hash.

#11 Updated by Matthew Kerwin 5 months ago

wardrop (Tom Wardrop) wrote:

They do differ, yes, but in most cases an enumerator is interchangeable with a Hash. I can't imagine anyone would be using Hash#select to get an enumerator anyway. If anyone is, then their code deserves to break to some extent. You should use Hash#enum_for or Hash#each methods if you want an enumerator from a hash.

Do you mean Enumerator (the class returned by many functions when !block_given?) or Enumerable (the module that defines #sort, #reverse, etc.)?

#12 Updated by Matthew Kerwin 5 months ago

phluid61 (Matthew Kerwin) wrote:

wardrop (Tom Wardrop) wrote:

They do differ, yes, but in most cases an enumerator is interchangeable with a Hash. I can't imagine anyone would be using Hash#select to get an enumerator anyway. If anyone is, then their code deserves to break to some extent. You should use Hash#enum_for or Hash#each methods if you want an enumerator from a hash.

Do you mean Enumerator (the class returned by many functions when !block_given?) or Enumerable (the module that defines #sort, #reverse, etc.)?

Sorry, I realise you do mean the right thing. I was put off by the fact that you say they're most often interchangeable; all they have in common is #each

#13 Updated by Tom Wardrop 5 months ago

Enumerator includes Enumerable, as does Hash. Enumerator introduces a few new methods that revolve around the concept of a cursor, but otherwise everything else comes from Enumerable.

My whole point is that for anyone using #reject or #select to retrieve an Enumerator from a Hash (which really no one should be doing), there's a good chance their code will still work, as long as they're not using the extra cursor functionality exclusive to Enumerators.

Hash#select(*keys) is such an appropriate interface for obtaining a subset of a hash, as that's exactly what the #select and #reject methods are intended for. It would be silly in my opinion to introduce a new method.

The question is, should Hash#select without arguments return a copy of the original hash, or an empty hash? Thinking about it, I'd say Hash#select should return an empty hash if no arguments are given, though this completely breaks compatibility for anyone using Hash#select (without arguments) as a means of obtaining an enumerator. Hash#reject without an argument should definitely return a full copy of the original hash.

#14 Updated by Alexey Muranov 5 months ago

wardrop (Tom Wardrop) wrote:

I think an appropriate solution would be to amend Hash#select. It currently doesn't take any arguments, so could easily take a list of (({*args})). You could use the optional block to further refine this selection if need be. The same for Hash#reject.

One potential issue though is that Hash#select without any arguments currently returns an enumerator. This could be problematic when doing something like (({hash.select args})) with an empty array. I think calling Hash#select without arguments should return a copy of the full original Hash, this keeps it somewhat compatible with the existing behaviour of returning an enumerator (given a Hash is an enumerator), while at the same time making it consistent with the new Hash#select(keys) functionality.

If Hash#select without arguments returns the original hash or an enumerator, it will contradict the proposal to use #select with args to "slice" a hash: it would have to return the empty hash.

#15 Updated by Alexey Muranov 5 months ago

However, i do not see why it has to be used as select(*args) and not select select(ary) or select(enum).

#16 Updated by Tom Wardrop 5 months ago

=begin
(({select(*args)})) just seemed like a more natural interface, though I suppose (({select(enum)})) provides more flexibility and solves any compatibility problems with the current behaviour of select. If an empty enumerable is given, an empty hash is returned. If no argument is given, then the current behaviour of returning an enumerator is respected. That'll work well.

In summary, I'm in favour of the (({select(enum)})) implementation, likewise for #reject.
=end

Also available in: Atom PDF