Project

General

Profile

Feature #17472

HashWithIndifferentAccess like Hash extension

Added by naruse (Yui NARUSE) 2 months ago. Updated about 1 month ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:101707]

Description

Rails has ActiveSupport::HashWithIndifferentAccess, which is widely used in Rails to handle Request, Session, ActionView's form construction, ActiveRecord's DB communication, and so on. It receives String or Symbol and normalize them to fetch the value. But it is implemented with Ruby. If we provide C implementation of that, Rails will gain the performance improvement.

summary of previous discussion: https://github.com/rails/rails/pull/40182#issuecomment-687607812

Updated by Eregon (Benoit Daloze) 2 months ago

Isn't a C extension in a gem enough?

Also what specifically would writing it in C instead of Ruby gain?
Intuitively I'd think there would be no significant gain to write it in C.

Is there any profile showing a significant amount of time is spent in HashWithIndifferentAccess?

Also, the semantics of HashWithIndifferentAccess IMHO don't fit the collection design of the core library.
And the name doesn't really fit with other class names in the core library.

Updated by mame (Yusuke Endoh) 2 months ago

+1, if Rails people really want it, and if it brings performance improvement. We need to experiment, but in principle, it looks to me a good idea to provide small but simple improvements that we can use immediately in Ruby 3.1. We mainly focused on big language improvements until 3.0, but many of them still require some time to be practically useful.

Updated by Eregon (Benoit Daloze) 2 months ago

I don't think C code will be more efficient for things like https://github.com/rails/rails/blob/914caca2d31bd753f47f9168f2a375921d9e91cc/activesupport/lib/active_support/hash_with_indifferent_access.rb#L367.
And translating the rest of the logic to C would just make it harder to read, maintain, and not gain anything.

Are we going to rewrite Rails in C too? ;)

Honestly, I think it's a waste of time to even experiment, it's bound to insifignicant gains for large efforts and maintenance cost.
IMHO, HashWithIndifferentAccess in core would be a mistake, as plain as a mistake can be.

Also, I see no valid reason to have this in core, if people really want to try this, they can make a C extension.

Updated by jeremyevans0 (Jeremy Evans) 2 months ago

I am against adding this in principle. One of the harder things for new Ruby programmers to understand is the difference between symbols and strings. This is even more difficult for programmers learning Ruby and Rails at the same time, due to the fact that Rails treats symbols and strings the same in most places.

I'm also against adding this for the reasons that Eregon (Benoit Daloze) mentioned. If a speed-up is desired, HashWithIndifferentAccess can easily be a C-extension gem, there is no pressing reason to have it in core or stdlib.

If we do decide to add this to Ruby, it should have a better name. HashWithIndifferentAccess is not indifferent in regards to type (1 and "1" are different). HashWithIndifferentAccess is not indifferent in regards to case ("a" and "A" are different). HashWithIndifferentAccess is not indifferent in regards to encoding, at least some of the time ("\u1234" and "\u1234".b are different). "WithIndifferentAccess" is too vague. HashCovertingSymbolKeysToStringKeys is more accurate, though quite long.

Updated by Dan0042 (Daniel DeLorme) 2 months ago

I'm also against this, but I think a more general-purpose version might be ok. In my code I use a NormalizedHash class which calls key = normalize(key) for every method with a key argument (as well as merge, etc.) I use this base class to define subclasses with specific normalize methods; HeaderHash converts keys to Http-Header-Camelcase, OptionsHash raises an error if the key is not a Symbol. Something that can be used as the basis of different implementations (including HashWithIndifferentAccess) might make sense in the stdlib. Might.

Updated by marcandre (Marc-Andre Lafortune) 2 months ago

Count me in the "No" camp.

HashWithIndifferentAccess has horrible semantics. It may have had a reason to be when Ruby didn't support symbol garbage collection and Rails didn't require an explicit mapping of HTTP Request to params, but not today.

#8

Updated by naruse (Yui NARUSE) 2 months ago

  • Description updated (diff)

Updated by naruse (Yui NARUSE) 2 months ago

My intention is

  • A feature to implement Rails's ActiveSupport::HashWithIndifferentAccess, not providing HashWithIndifferentAccess itself.
  • the key of internal hash should be symbol though ActiveSupport::HashWithIndifferentAccess uses String as key.
  • rails doesn't depend C extension.

I added a link to kamipo's summary as more context
https://github.com/rails/rails/pull/40182#issuecomment-687607812

Dan0042 (Daniel DeLorme) wrote in #note-6:

I'm also against this, but I think a more general-purpose version might be ok. In my code I use a NormalizedHash class which calls key = normalize(key) for every method with a key argument (as well as merge, etc.) I use this base class to define subclasses with specific normalize methods; HeaderHash converts keys to Http-Header-Camelcase, OptionsHash raises an error if the key is not a Symbol. Something that can be used as the basis of different implementations (including HashWithIndifferentAccess) might make sense in the stdlib. Might.

You can use Hash#compare_by_identity (https://ruby-doc.org/core-2.7.1/Hash.html#method-i-compare_by_identity).

Updated by byroot (Jean Boussier) 2 months ago

A feature to implement Rails's ActiveSupport::HashWithIndifferentAccess

Would it be possible to have a "hook" akin to convert_key?

e.g. something like:

hash = {}
hash.coerce_key = ->(key) { key.is_a?(Symbol) ? key.name : key }

Updated by naruse (Yui NARUSE) 2 months ago

byroot (Jean Boussier) wrote in #note-10:

A feature to implement Rails's ActiveSupport::HashWithIndifferentAccess

Would it be possible to have a "hook" akin to convert_key?

I think it will not fast.

Updated by Eregon (Benoit Daloze) 2 months ago

naruse (Yui NARUSE) wrote in #note-11:

I think it will not fast.

Before we start considering performance trade-offs,
do we even have a benchmark where time spent in HashWithIndifferentAccess is significant for a Rails app?

Updated by zverok (Victor Shepelev) 2 months ago

I believe that HashWithIndifferentAccess is one of the very false ideas in Rails -- and that as of today, it is a more or less common understanding in the community.

The distinction of Symbol (as a controlled internal name) and String (as an input/output user data) is one of the very powerful concepts in Ruby, and the "I don't want to think whether it is Symbol or String, internal or external" is explicitly against this distinction (as far as I can understand, the holders of this ideas would actually prefer to have ONLY strings, but have a shorter syntax for string keys, like in JS, and that's how they perceive symbols).

Eventually, even Rails started to make a clearer distinction between internal/external (see StrongParams), but HWIA is so omnipresent there, that they doubtfully get rid of it anytime soon. But I don't believe it is a reason to introduce it in a core language (not even mentioning the fact that the reason "Rails uses it, and it should be implemented in C, let's add it to the language core" feels quite weird).

I am really surprised that Ruby core developers feel so positive about it.

Updated by nobu (Nobuyoshi Nakada) 2 months ago

I had a vague original idea for this proposal, which extends the Hash class generically.
First I though about case-insensitive string hashes, it had been able by using $= in old days.
The special variable was removed, still there are that use cases, e.g., HTTP headers, command completions, etc.

As I glanced st.c again this time, confirmed that customizing key conversion per instances isn't possible as far as keeping the backward compatibility.
I think case-insensitive (only for String) Hash, like hashes compared by identity, would be possible, though.

Updated by hcatlin (Hampton Catlin) about 2 months ago

During my 15 years of Ruby programming, I can't remember a single time that the difference between string-and-symbol with regards to Hashes was used on purpose. Instead, it's the source of countless bugs, extra type checking code, and difficulty when you install a new library– ("hmmm, is this options hash going to be stringed or symboled?")

Would anyone here think the code below was acceptable?

options["host"] = "ruby-lang.org"
options[:port] = 3000

Not only would we likely flag that code, but chances are that it might not work! And this is why Rails has HashWithIndifferentAccess and most of us would prefer to use it in our libraries, and just don't' want to have the awkwardness of having to type out that horribly long name every time.

In almost every use of Hash in my career, the keys have being either symbols or strings, and I can't think of production code that even uses the fact that you can use an object as a key as a feature. I think I've attempted it myself a couple times, but usually refactored the code after, berating myself for trying to be a little too clever.

Which brings me to #11882 !

I opened this proposal 5 years ago that would basically mean that Rails wouldn't need HashWithIndifferentAccess, as Ruby would have it's own stringed, symbol-or-string agnostic Map implementation.

In the last year I've come back to full time Ruby programming, and this remains one of my biggest frustrations with the language. Hash was cute when I first learned the language and it was an early differentiator, but other languages have improved this paradigm and I think we should too!

A first-level Ruby syntax for easily creating Hash-like objects that you can use as a Dictionary/Map/etc without worrying about if it's a string or a symbol is my 2021 wish! :)

Updated by timcraft (Tim Craft) about 1 month ago

hcatlin (Hampton Catlin) wrote in #note-15:

Would anyone here think the code below was acceptable?

Mixing symbol keys and string keys together like that would no doubt be very confusing, but why would you choose to do that instead of just using symbol keys or string keys? With modern Ruby the use case handled by options hashes is typically better handled by keyword arguments, and sometimes an options/config object might be a more appropriate choice. Keyword argument hashes use symbol keys, and I would expect options/config objects to be implemented using symbol keys.

We now also have Hash#transform_keys, so transforming a hash with string keys from outside data sources to symbol keys is straightforward. That can easily handle use cases like transforming keys in config/database.yml to symbols. JSON has symbolize_names and so on. There doesn't need to be confusion if library authors make a clear decision on whether to use strings or symbols for a specific use case, instead of trying to support both.

And this is why Rails has HashWithIndifferentAccess and most of us would prefer to use it in our libraries, and just don't' want to have the awkwardness of having to type out that horribly long name every time.

Maybe I'm in the minority, but I would never use HashWithIndifferentAccess in my own libraries, and I try hard not to use it directly in Rails because I've seen it become the source of confusion and subtle bugs!

I think the critical use case for HashWithIndifferentAccess is params; where you want to be using symbol keys because it's cleaner syntax, but the keys are coming from untrusted input. But HashWithIndifferentAccess is just the historical solution that exists for that case. There's no reason there couldn't be an ActiveSupport::Params class which allowed for params.title instead of params[:title], making the choice between whether to use string keys or symbol keys disappear (in the calling code at least).

In almost every use of Hash in my career, the keys have being either symbols or strings, and I can't think of production code that even uses the fact that you can use an object as a key as a feature. I think I've attempted it myself a couple times, but usually refactored the code after, berating myself for trying to be a little too clever.

I have a lot of code which does, plenty of it in production. Hashes that represent a set of attributes, hashes that represent a set of variables (e.g. for template locals), and keyword argument hashes are relatively frequent examples for using symbol keys. The syntax for hash literals with symbol keys is so much nicer to read and write than the hashrocket syntax with string keys.

Apart from symbols the other common use case I encounter is time series data. I frequently use date keys, integer keys for years, month keys, and quarter keys. Having this as an option instead of only being able to use strings is one of the reasons I prefer Ruby to other languages. And I don't consider that kind of code to be clever. Quite the opposite, it feels like "boring" code which just works.

In the last year I've come back to full time Ruby programming, and this remains one of my biggest frustrations with the language.

Welcome back to Ruby! :) Are you working on Rails apps? If so it seems a little unfair to get frustrated at the language for choices made by the web framework. If not can you share an example of a Ruby use case that would be significantly improved by HashWithIndifferentAccess?


FWIW I'm against this proposal for the reasons already mentioned:

  • The name is terrible, which apart from making it clumsy to use and not seeming very Ruby-ish I think demonstrates that the semantics of the class are a bit unclear. As Jeremy points out it is not indifferent in many respects, just with symbols and strings.

  • There isn't any profiling data which shows this is a significant bottleneck for a significant number of Rails projects. Optimizing performance for Date, Time, BigDecimal, JSON, URI, YAML etc would benefit Rails and a broader range of Ruby applications.

  • Re-implementing in C adds more work for maintainers of alternative Ruby implementations, for seemingly little benefit.

The original motivation here seems to be for improving performance of Rails applications. Why does it need to be in Ruby core/stdlib? Why can't it just be implemented as a C extension packaged in a gem?

Also available in: Atom PDF