Project

General

Profile

Feature #14636

`Hash` has a method for accessing the shortest path towards a certain key

Added by RudySeidinger (Rudy Seidinger) 9 months ago. Updated 9 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:86326]

Description

Abstract

Hashes, as a collection of key-value pairs, are often used to represent trees. Having a way of traversing the nodes quicker is very valuable when analyzing big hashes.

Use-case

As pointed here, in the question https://stackoverflow.com/questions/8301566/find-key-value-pairs-deep-inside-a-hash-containing-an-arbitrary-number-of-nested, and used in gems like https://rubygems.org/gems/hashie, we can infer that such a method would bring value to the core Ruby object, without necessarily needing extensions and taking benefit of the huge performance gain of having a native method

The idea was discussed with Matz at the RubyHackChallenge at Cookpad office in Bristol and expressed here https://github.com/ko1/rubyhackchallenge/issues/44.

Implementation approach

I've came with one possible implementation at https://github.com/ruby/ruby/pull/1847. By using recursive calls, one can continuously store the path until it reaches the desired key, or reset it, in case the last element is not what is been looked for. The advantage of the approach from a iterative one (that was started before) is that, by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached. The recursive approach is safe due to usage of rb_exec_recursive and the subsequent checking of infinite recursion.

Cases of other languages or libraries

Python: suggested approach at StackOverflow (standard library)
https://stackoverflow.com/questions/14962485/finding-a-key-recursively-in-a-dictionary

JavaScript: json-query (npm library)
https://www.npmjs.com/package/json-query

RubyGems: Hashie gem
https://rubygems.org/gems/hashie

History

#1 [ruby-core:86328] Updated by Hanmac (Hans Mackowiak) 9 months ago

i have some thoughts about it ...
if you got a structure like this:

{
  :a => {
    :name => "abc"
  },
  :b => {
    :name => "xyz"
  },
}

what does deep_key(:name) return?

#2 [ruby-core:86329] Updated by RudySeidinger (Rudy Seidinger) 9 months ago

Hanmac (Hans Mackowiak) wrote:

i have some thoughts about it ...
if you got a structure like this:

{
  :a => {
    :name => "abc"
  },
  :b => {
    :name => "xyz"
  },
}

what does deep_key(:name) return?

In this case, is the shortest path towards the first key containing the specified parameter, so would be

[:a]

#3 [ruby-core:86332] Updated by shevegen (Robert A. Heiler) 9 months ago

This reminds me a bit of guide trees in bioinformatics, where
we try to find the shortest path of substring matches to another
string (a bit similar to how BLAST searching works
https://blast.ncbi.nlm.nih.gov/Blast.cgi though I don't know if
they use a guide tree).

Some months ago I was very surprised that this gem was quite
popular:

https://rubygems.org/gems/diff-lcs

It's actually ranked 6 among the ruby gems, which surprised me
a lot. (I found it accidentally when trying to find faster
algorithms for something involving the levensthein distance).

Anyway to be more on topic - I think it would be a good idea
if ruby by default would make available algorithms and search
patterns that can be quite useful on their own. Not just limited
to the suggestion here but more general too.

I am not sure if class Hash is the way to store these by
default though. I think these are quite specialized use cases
and perhaps they aren't that useful by default. How about
some extensions but these be distributed with core/stdlib
ruby too? Just requiring an explicit require or something
like that; a bit like the did-you-mean gem has, if you look
here:

https://github.com/yuki24/did_you_mean#experimental-features

I was also surprised to find out that rubygems includes
levensthein already, via rubygems/text. :-)

Anyway, I think the main suggestion is perfectly fine so
+1; I am just not sure if it should be on class Hash
by default.

By the way, I hope it will be extensively documented so
that people understand what it is doing. I understand only
like half of what has been written above... ;-)

#4 [ruby-core:86339] Updated by nobu (Nobuyoshi Nakada) 9 months ago

RudySeidinger (Rudy Seidinger) wrote:

by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached.

rb_exec_recursive creates a temporary hash internally to track recursion.
There is no advantage on memory usage, but you don't have to take care of the internal hash by using the function, of course.

#5 [ruby-core:86340] Updated by nobu (Nobuyoshi Nakada) 9 months ago

RudySeidinger (Rudy Seidinger) wrote:

In this case, is the shortest path towards the first key containing the specified parameter, so would be

[:a]

It is one of the shortest paths, but not only.

An enumerator may be used here,

hash.path_to_key(:name) {|path| ...}

and

hash.path_to_key(:name).first

#6 [ruby-core:86360] Updated by RudySeidinger (Rudy Seidinger) 9 months ago

nobu (Nobuyoshi Nakada) wrote:

RudySeidinger (Rudy Seidinger) wrote:

by using recursion, we avoid the overhead of having the store a temporary hash (tree) to fall-back to whenever the end of a sub-hash is reached.

rb_exec_recursive creates a temporary hash internally to track recursion.
There is no advantage on memory usage, but you don't have to take care of the internal hash by using the function, of course.

Sure. I just thought that the iterative approach would not justify not using recursion. The implementation would be way more complex with no clear advantage. Maintainability might be a more prioritised concern

#7 [ruby-core:86361] Updated by RudySeidinger (Rudy Seidinger) 9 months ago

nobu (Nobuyoshi Nakada) wrote:

RudySeidinger (Rudy Seidinger) wrote:

In this case, is the shortest path towards the first key containing the specified parameter, so would be

[:a]

It is one of the shortest paths, but not only.

An enumerator may be used here,

hash.path_to_key(:name) {|path| ...}

and

hash.path_to_key(:name).first

Hmmm, interesting. What would use the block for, in the first scenario? the yield of the block would be execute in which context? inside the enumerator?

Also available in: Atom PDF