Feature #19272
closedHash#merge: smarter protocol depending on passed block arity
Description
Usage of Hash#merge
with a "conflict resolution block" is almost always clumsy: due to the fact that the block accepts |key, old_val, new_val|
arguments, and many trivial usages just somehow sum up old and new keys, the thing that should be "intuitively trivial" becomes longer than it should be:
# I just want a sum!
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5) { |_, o, n| o + n }
# I just want a group!
{words: %w[I just]}.merge(words: %w[want a group]) { |_, o, n| [*o, *n] }
# I just want to unify flags!
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
.merge('file1' => File::WRITABLE) { |_, o, n| o | n }
# ...or, vice versa:
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
.merge('file1' => File::WRITABLE, 'file2' => File::WRITABLE) { |_, o, n| o & n }
It is especially noticeable in the last two examples, but the usual problem is there are too many "unnecessary" punctuation, where the essential might be lost.
There are proposals like #19148, which struggle to define another method (what would be the name? isn't it just merging?)
But I've been thinking, can't the implementation be chosen based on the arity of the passed block?.. Prototype:
class Hash
alias old_merge merge
def merge(other, &block)
return old_merge(other) unless block
if block.arity.abs == 2
old_merge(other) { |_, o, n| block.call(o, n) }
else
old_merge(other, &block)
end
end
end
E.g.: If, and only if, the passed block is of arity 2, treat it as an operation on old and new values. Otherwise, proceed as before (maintaining backward compatibility.)
Usage:
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5, &:+)
#=> {:apples=>4, :oranges=>2, :bananas=>5}
{words: %w[I just]}.merge(words: %w[want a group], &:concat)
#=> {:words=>["I", "just", "want", "a", "group"]}
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
.merge('file1' => File::WRITABLE, &:|)
#=> {"file1"=>5, "file2"=>5}
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
.merge('file1' => File::WRITABLE, 'file2' => File::WRITABLE, &:&)
#=> {"file1"=>0, "file2"=>4}
# If necessary, the old protocol still works:
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5) { |k, o, n| k == :apples ? 0 : o + n }
# => {:apples=>0, :oranges=>2, :bananas=>5}
As far as I can remember, Ruby core doesn't have methods like this (that change implementation depending on the arity of passed callable), but I think I saw this approach in other languages. Can't remember particular examples, but always found this idea appealing.
Updated by zverok (Victor Shepelev) almost 2 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) almost 2 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) almost 2 years ago
- Description updated (diff)
Updated by sawa (Tsuyoshi Sawada) almost 2 years ago
Using numbered parameters, we can do slightly better:
{apples: 1, oranges: 2}.merge({apples: 3, bananas: 5}){_2 + _3}
although I am neutral about the proposal.
Updated by zverok (Victor Shepelev) almost 2 years ago
@sawa (Tsuyoshi Sawada) I didn't mention the solution with numeric arguments because I believe it to be even more cryptic than with named ones.
The reader needs to remember at all times what's the protocol of merge block (merge
with a block is not used every day, so it is not a given) and what was that first argument that we are ignoring.
With named arguments, we can at least give a hint (in some codebases, I use _k, o, n
, which is more like "note to self", in others, I prefer _key, oldval, newval
or something like that).
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
zverok (Victor Shepelev) wrote:
E.g.: If, and only if, the passed block is of arity 2, treat it as an operation on old and new values. Otherwise, proceed as before (maintaining backward compatibility.)
Usage:
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5, &:+) #=> {:apples=>4, :oranges=>2, :bananas=>5}
:+.to_proc
is a proc just calls +
method on the first argument with the rest.
That means its arity is not deterministic.
{words: %w[I just]}.merge(words: %w[want a group], &:concat) #=> {:words=>["I", "just", "want", "a", "group"]}
In this example, you expect Array#concat
on the old values, but the arity of Array#concat
is -1 not 2.
Updated by zverok (Victor Shepelev) almost 2 years ago
@nobu (Nobuyoshi Nakada) All of my examples work with my reference implementation. You can try it yourself.
:any_symbol.to_proc.arity
is -2
, corresponding to the following lambda:
->(first, *rest) { first.send(symbol, *rest) }
The behavior is corresponding, too:
def fake_to_proc(symbol) = ->(first, *rest) { first.send(symbol, *rest) }
:+.to_proc.arity #=> -2
fake_to_proc(:+).arity #=> -2
:+.to_proc.parameters #=> [[:req], [:rest]]
fake_to_proc(:+).parameters #=> [[:req, :first], [:rest, :rest]]
:+.to_proc.call(1)
# `+': wrong number of arguments (given 0, expected 1) (ArgumentError) -- on handling +, not calling the lambda
fake_to_proc(:+).call(1)
# `+': wrong number of arguments (given 0, expected 1) (ArgumentError)
:+.to_proc.call(1, 2) #=> 3
fake_to_proc(:+).call(1, 2) #=> 3
Therefore:
- Any
:+.to_proc.arity
is -2 - Which is not a bug/accident, but a proper reporting of arity/parameters
- Which actually made me think about this idea with
merge
:) - Which works with the reference implementation.
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
zverok (Victor Shepelev) wrote in #note-7:
- Any
:+.to_proc.arity
is -2- Which is not a bug/accident, but a proper reporting of arity/parameters
That -2 means just unlimited.
- Which actually made me think about this idea with
merge
:)
.abs == 2
? 😅
Updated by zverok (Victor Shepelev) almost 2 years ago
That -2 means just unlimited.
Well, it is obviously not my call to decide what it means, but I interpret it as "2 explicitly declared params (plus some unpacking probably happening)". I mean, it is not exactly the same as -1
or -3
, right?..
So I believe it is a good enough heuristic for this case because when somebody provides an old-style block, its arity would be:
proc { |key, oldval, newval| }.arity #=> 3
E.g. not 2 or -2 definitely.
So, yeah, arity.abs == 2
is a lousy heuristic, but my estimation is it should be enough to provide reasonable distinction and handle most common cases to simplify.
Updated by Eregon (Benoit Daloze) almost 2 years ago
-2 means 1 required argument, and rest argument (e.g. p method(def m(a,*); end).arity => -2
).
I think using this new behavior for -2 is too hacky.
For arity == 2, it seems more reasonable, and the examples above could use _1 + _2
, etc.
Although changing for arity 2 could break code like a.merge(b) { |k,old| old }
.
Updated by matz (Yukihiro Matsumoto) almost 2 years ago
- Status changed from Open to Rejected
It looks nice at the first sight but may cause the compatibility issue as @Eregon (Benoit Daloze) mentioned.
Matz.