Feature #14636
open`Hash` has a method for accessing the shortest path towards a certain key
Description
Abstract¶
Hashes, as a collection of key-value pairs, are often used to represent trees. Having a way of traversing the nodes quicker is very valuable when analyzing big hashes.
Use-case¶
As pointed here, in the question https://stackoverflow.com/questions/8301566/find-key-value-pairs-deep-inside-a-hash-containing-an-arbitrary-number-of-nested, and used in gems like https://rubygems.org/gems/hashie, we can infer that such a method would bring value to the core Ruby object, without necessarily needing extensions and taking benefit of the huge performance gain of having a native method
The idea was discussed with Matz at the RubyHackChallenge at Cookpad office in Bristol and expressed here https://github.com/ko1/rubyhackchallenge/issues/44.
Implementation approach¶
I've came with one possible implementation at https://github.com/ruby/ruby/pull/1847. By using recursive calls, one can continuously store the path until it reaches the desired key, or reset it, in case the last element is not what is been looked for. The advantage of the approach from a iterative one (that was started before) is that, by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached. The recursive approach is safe due to usage of rb_exec_recursive
and the subsequent checking of infinite recursion.
Cases of other languages or libraries¶
Python: suggested approach at StackOverflow (standard library)
https://stackoverflow.com/questions/14962485/finding-a-key-recursively-in-a-dictionary
JavaScript: json-query (npm library)
https://www.npmjs.com/package/json-query
RubyGems: Hashie gem
https://rubygems.org/gems/hashie
Updated by Hanmac (Hans Mackowiak) over 6 years ago
i have some thoughts about it ...
if you got a structure like this:
{
:a => {
:name => "abc"
},
:b => {
:name => "xyz"
},
}
what does deep_key(:name)
return?
Updated by RudySeidinger (Rudy Seidinger) over 6 years ago
Hanmac (Hans Mackowiak) wrote:
i have some thoughts about it ...
if you got a structure like this:{ :a => { :name => "abc" }, :b => { :name => "xyz" }, }
what does
deep_key(:name)
return?
In this case, is the shortest path towards the first key containing the specified parameter, so would be
[:a]
Updated by shevegen (Robert A. Heiler) over 6 years ago
This reminds me a bit of guide trees in bioinformatics, where
we try to find the shortest path of substring matches to another
string (a bit similar to how BLAST searching works
https://blast.ncbi.nlm.nih.gov/Blast.cgi though I don't know if
they use a guide tree).
Some months ago I was very surprised that this gem was quite
popular:
https://rubygems.org/gems/diff-lcs
It's actually ranked 6 among the ruby gems, which surprised me
a lot. (I found it accidentally when trying to find faster
algorithms for something involving the levensthein distance).
Anyway to be more on topic - I think it would be a good idea
if ruby by default would make available algorithms and search
patterns that can be quite useful on their own. Not just limited
to the suggestion here but more general too.
I am not sure if class Hash is the way to store these by
default though. I think these are quite specialized use cases
and perhaps they aren't that useful by default. How about
some extensions but these be distributed with core/stdlib
ruby too? Just requiring an explicit require or something
like that; a bit like the did-you-mean gem has, if you look
here:
https://github.com/yuki24/did_you_mean#experimental-features
I was also surprised to find out that rubygems includes
levensthein already, via rubygems/text. :-)
Anyway, I think the main suggestion is perfectly fine so
+1; I am just not sure if it should be on class Hash
by default.
By the way, I hope it will be extensively documented so
that people understand what it is doing. I understand only
like half of what has been written above... ;-)
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
RudySeidinger (Rudy Seidinger) wrote:
by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached.
rb_exec_recursive
creates a temporary hash internally to track recursion.
There is no advantage on memory usage, but you don't have to take care of the internal hash by using the function, of course.
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
RudySeidinger (Rudy Seidinger) wrote:
In this case, is the shortest path towards the first key containing the specified parameter, so would be
[:a]
It is one of the shortest paths, but not only.
An enumerator may be used here,
hash.path_to_key(:name) {|path| ...}
and
hash.path_to_key(:name).first
Updated by RudySeidinger (Rudy Seidinger) over 6 years ago
nobu (Nobuyoshi Nakada) wrote:
RudySeidinger (Rudy Seidinger) wrote:
by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached.
rb_exec_recursive
creates a temporary hash internally to track recursion.
There is no advantage on memory usage, but you don't have to take care of the internal hash by using the function, of course.
Sure. I just thought that the iterative approach would not justify not using recursion. The implementation would be way more complex with no clear advantage. Maintainability might be a more prioritised concern
Updated by RudySeidinger (Rudy Seidinger) over 6 years ago
nobu (Nobuyoshi Nakada) wrote:
RudySeidinger (Rudy Seidinger) wrote:
In this case, is the shortest path towards the first key containing the specified parameter, so would be
[:a]
It is one of the shortest paths, but not only.
An enumerator may be used here,
hash.path_to_key(:name) {|path| ...}
and
hash.path_to_key(:name).first
Hmmm, interesting. What would use the block for, in the first scenario? the yield
of the block would be execute in which context? inside the enumerator?