Project

General

Profile

Actions

Feature #20300

open

Hash: set value and get pre-existing value in one call

Added by AMomchilov (Alexander Momchilov) 9 months ago. Updated 7 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:116940]

Description

When using a Hash, sometimes you want to set a new value, and see what was already there. Today, you have to do this in two steps:

h = { k: "old value" }

# 1. Do a look-up for `:k`.
old_value = h[:k]
# 2. Do another look-up for `:k`, even though we just did that!
h[:k] = "new value"

use(old_value)

This requires two separate Hash look-ups for :k. This is fine for symbols, but is expensive if computing #hash or #eql? is expensive for the key. It's impossible to work around this today from pure Ruby code.

One example use case is Set#add?. See https://bugs.ruby-lang.org/issues/20301 for more details.

I propose adding Hash#exchange_value, which has semantics are similar to this Ruby snippet:

class Hash
  # Exact method name TBD.
  def exchange_value(key, new_value)
    old_value = self[key]
    self[key] = new_value
    old_value
  end
end

... except it'll be implemented in C, with modifications to tbl_update that achieves this with a hash-lookup.

I'm opening to alternative name suggestions. @nobu (Nobuyoshi Nakada) came up with exchange_value, which I think is great.

Here's a PR with a PoC implementation: https://github.com/ruby/ruby/pull/10092

h = { k: "old value" }

# Does only a single hash look-up
old_value = h.exchange_value(:k, "new value")

use(old_value)

Related issues 2 (1 open1 closed)

Related to Ruby master - Bug #20301: `Set#add?` does two hash look-upsOpenActions
Related to Ruby master - Feature #17342: Hash#fetch_setFeedbackActions
Actions #1

Updated by AMomchilov (Alexander Momchilov) 9 months ago

  • Description updated (diff)
Actions #2

Updated by AMomchilov (Alexander Momchilov) 9 months ago

  • Description updated (diff)
Actions #3

Updated by AMomchilov (Alexander Momchilov) 9 months ago

  • Description updated (diff)
Actions #4

Updated by AMomchilov (Alexander Momchilov) 9 months ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) 9 months ago

The name update_value doesn't feel to return the previous value, to me.
What about Hash#exchange_value?

Updated by Eregon (Benoit Daloze) 9 months ago

FWIW, concurrent-ruby calls this get_and_set: https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Map.html#get_and_set-instance_method

Hash#exchange_value sounds fine to me too.

Updated by rubyFeedback (robert heiler) 9 months ago · Edited

Not so sure about these names. Note that get_and_set() may have people wonder why it is not set_and_get(). :D

In regards to .update_value() people could ask why the old value is returned rather than the new one.

I guess exchange_value is a bit more specific than the other variants, as people can say "I put in value x, as the new value, and get, in exchange, value y which was the old one".

I don't have a better name suggestion though, but just from the alternatives so far I think nobu's suggestion is at the least better than the other ones, in my opinion.

I don't have any particular opinion on the suggested functionality; in my own code I kind of keep setters and getters usually separate, or, e. g. first set and then get (or get first, assign to a variable, then set to a new value). It would be nice if we could have some kind of "universal idiom" for the desired functionality though - could make it easier if different programming languages, at the least OOP-centric ones, would use the same name/terminology for that. That would make naming things easier too, or at the least more consistent across programming languages, just like setters and getters are fairly well-established idioms in OOP-centric languages (ruby is kind of multi-paradigm; I don't know how functional programming languages solve this issue of setting and obtaining a new or old value in one go, so I can't comment on that).

Actions #8

Updated by Eregon (Benoit Daloze) 9 months ago

  • Related to Bug #20301: `Set#add?` does two hash look-ups added
Actions #9

Updated by Eregon (Benoit Daloze) 9 months ago

Updated by MaxLap (Maxime Lapointe) 9 months ago

In my mind, get_and_set is clear as it is in the order the operations are done.

But if you want clearer, get_then_set puts emphasis on the order.

Things like exchange and swap can sound like swapping the value between two keys.

Honestly, I'm not sure I ever needed this. If fetch_set was refused, which I frequently need, this likely won't pass either.

Updated by matheusrich (Matheus Richard) 9 months ago

Rust calls this method insert:

Inserts a key-value pair into the map.

If the map did not have this key present, None is returned.

If the map did have this key present, the value is updated, and the old value is returned.

let mut map = HashMap::new();
assert_eq!(map.insert(37, "a"), None);
assert_eq!(map.is_empty(), false);

map.insert(37, "b");
assert_eq!(map.insert(37, "c"), Some("b"));
assert_eq!(map[&37], "c");

Updated by AMomchilov (Alexander Momchilov) 9 months ago

@matheusrich (Matheus Richard) I like that name in general, but it's really similar to the existing Hash#store and the distinction is non-obvious.

Swift calls this updateValue(_:forKey:), which returns a Value? ("the value that was replaced, or nil if a new key-value pair was added."), but it has a similar problem.

I think #exchange_value is the best recommendation yet, because it makes it reasonably easy to guess what the return value would be, even without reading the docs.

Updated by matheusrich (Matheus Richard) 9 months ago

Some other examples from other languages:

  • Java and Kotlin call this put.
  • Groovy calls it putAt
  • Elixir has an interesting get_and_update, which is very descriptive, but also accepts a function argument, which can be used to operate on the old value before returning it. I could see us supporting both cases:
hash.get_and_update(:key, :new_value)
hash.get_and_update(:key) { |old| old.upcase }
  • Clojure does something similar with update-in.
  • Racket calls it hash-update.
  • Haskell calls this insertLookupWithKey 😅

I like exchange_value too. While a bit more verbose, having the _value suffix opens the opportunity to add a similar method for keys in the future

Another suggestion is replace_value, since we already have replace on hash.

Updated by AMomchilov (Alexander Momchilov) 9 months ago · Edited

@matheusrich (Matheus Richard) Thanks for putting together that list!

In general, I like the idea of having this method support an optional block, but we would need to benchmark it to ensure it doesn't have an undesired slow down in the exchange_value(:key, :new_value) case.

Really, this API's main goal is performance (since it doesn't do anything you couldn't already do with a combination of #has_key?, #[] and #[]=), so if an optional block arg slows that down, it would entirely defeat its benefit. That can still exist as an option, it would just need to be a different method.

Updated by matheusrich (Matheus Richard) 9 months ago

#get_and_transform could be a name for the version with the block, if performance is impacted significantly.

Actions #16

Updated by AMomchilov (Alexander Momchilov) 9 months ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) 9 months ago

I want ENV.exchange_value rather than Hash#exchange_value.

Updated by mame (Yusuke Endoh) 9 months ago

Discussed at developers meeting. @shyouhei (Shyouhei Urabe) pointed out that Set#add? could be improved with a single lookup by looking at the change in size, without this proposal.

https://bugs.ruby-lang.org/issues/20301#note-8

@matz (Yukihiro Matsumoto) said that we need another good use case to introduce this if it is sufficient for Set#add?.

Updated by Eregon (Benoit Daloze) 8 months ago · Edited

That implementation is incorrect (i.e. less correct than current implementation), and Hash#exchange_value is needed to be thread-safe for Set, see https://bugs.ruby-lang.org/issues/20301.

Updated by Eregon (Benoit Daloze) 8 months ago · Edited

nobu (Nobuyoshi Nakada) wrote in #note-17:

I want ENV.exchange_value rather than Hash#exchange_value.

Agreed it's a generally useful thing (e.g. to swap an env var and later restore it).
So I think we should add it for both Hash and ENV (ENV has all Hash methods anyway).

Updated by AMomchilov (Alexander Momchilov) 8 months ago

nobu (Nobuyoshi Nakada) wrote in #note-17:

I want ENV.exchange_value rather than Hash#exchange_value.

Is there a reason to put it on ENV, but not on Hash? That seems like a needless restriction to me.

Updated by shyouhei (Shyouhei Urabe) 8 months ago

I'm neutral to this particular method (could be handy). If thread safety is in mind perhaps atomic compare-and-exchange could be better than just exchange, though.

Updated by nobu (Nobuyoshi Nakada) 7 months ago · Edited

AMomchilov (Alexander Momchilov) wrote in #note-21:

nobu (Nobuyoshi Nakada) wrote in #note-17:

I want ENV.exchange_value rather than Hash#exchange_value.

Is there a reason to put it on ENV, but not on Hash? That seems like a needless restriction to me.

ENV is a hash-like object, so I'm for Hash#exchange_value too.

Updated by mame (Yusuke Endoh) 7 months ago

@matz (Yukihiro Matsumoto) said he was not sure what a name is good for this method because its true motivation is unclear.

It was originally intended as a method to improve the efficiency of Set#add?, but the use case was shifted to a method for thread safety. This history makes the use case less persuasive.
If the main purpose is thread safety, we want to respect the terminology of the parallel computing area. If it is just an internal method for efficiency, a long and verbose name may be preferred. (If it is a daily-use method, a short and convenient name may be preferred.)

You may want to explain the concrete example of the use case of thread safety for Hash value exchange, this proposed API is really sufficient for that example use case, and what API and name are given to similar feature in other languages.

Updated by AMomchilov (Alexander Momchilov) 7 months ago

mame (Yusuke Endoh) wrote in #note-24:

It was originally intended as a method to improve the efficiency of Set#add?, but the use case was shifted to a method for thread safety. This history makes the use case less persuasive.

The immediate need and use case is for that performance improvement to Set.

The discussion of thread safety is just an incidental benefit of this approach, as compared to the count-based alternative that was proposed.

If it is just an internal method for efficiency, a long and verbose name may be preferred.

I would be okay with this, if there’s no appetite for a public, short-named API for this. Perhaps some underscored name that communicates that this is “library internal” to the stdlib.

You may want to explain the concrete example of the use case of thread safety for Hash value exchange, this proposed API is really sufficient for that example use case,

I don’t think I can make a better case than the concrete implementation I’ve already proposed, and its immediate application to the Set performance problem I’ve identified and fixed, using this API.

If others want to chime in with the use cases in they have in mind, I’d love to hear those.

and what API and name are given to similar feature in other languages.

Let’s first decide if we want this API, in any form. Once we agree on that, I’d be happy to dive into details like naming, and comparisons to other languages.

Actions

Also available in: Atom PDF

Like5
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like1Like1Like0Like0Like0Like1Like1Like0Like0Like0Like0Like0