Project

General

Profile

Bug #13857

frozen string literal: can freeze same string into two unique frozen strings

Added by chucke (Tiago Cardoso) over 1 year ago. Updated over 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
2.3.4, 2.4.1
[ruby-core:82613]

Description

Running an interpreter with --enable-frozen-string-literal on, I get the following:

> "bang".object_id #=> 70303434940940  GOOD!
> "bang".object_id #=> 70303434940940  GOOD!
> "bang".object_id #=> 70303434940940  GOOD!
> c = "bang" 
> c.object_id #=> 70303434940940 GOOD!
> c.downcase #=> "bang"
> c.downcase.object_id #=> 70303430619560  SO SO!
> c.downcase.freeze.object_id #=> 70303430601780  BAD!

This ticket concerns the last two examples. In the case of performing an operation on the string, it makes sense to return a new string, even if the result is the same. However, I think that the last one could be done differently, in that the frozen result of the downcased value should be the original literal.

I didn't see yet how the frozen string literals are implemented, so this might be dependent on it, but I think that this misses a few optimization use cases. One notable example is keeping a headers hash from an http library. net/http keeps a version of the headers hash with the keys downcased, only to capitalize them on send. Something like this:

request["Content-Type"] = "text/html" #=> key stored in request will be "content-type"

will create more allocations than expected.


Related issues

Related to Ruby trunk - Feature #13725: [PATCH] Hash#[]= deduplicates string keys if (and only if) fstring existsClosedActions

History

#1

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Related to Feature #13725: [PATCH] Hash#[]= deduplicates string keys if (and only if) fstring exists added

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Status changed from Open to Rejected

chucke (Tiago Cardoso) wrote:

Running an interpreter with --enable-frozen-string-literal on, I get the following:

> c.downcase.object_id #=> 70303430619560  SO SO!
> c.downcase.freeze.object_id #=> 70303430601780  BAD!

These are not literals, so not subjects of frozen-string-literal.

BTW:

request["Content-Type"] = "text/html" #=> key stored in request will be "content-type"

will create more allocations than expected.

Hash key strings are deduped in the trunk already.

$ ruby -v --enable-frozen-string-literal -e 'request = {}; key = "Content-Type".downcase; request[key] = "text/html"; newkey = request.keys[0]; p key.equal?(newkey), newkey.equal?("content-type")'
ruby 2.5.0dev (2017-08-31 trunk 59695) [universal.x86_64-darwin15]
false
true

Updated by chucke (Tiago Cardoso) over 1 year ago

These are not literals, so not subjects of frozen-string-literal.

I'd argue, that's an implementation detail. According to the principle of least surprise, I'd expect them to be the same.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

chucke (Tiago Cardoso) wrote:

These are not literals, so not subjects of frozen-string-literal.

I'd argue, that's an implementation detail.

It's not an implementation detail, but a language spec.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

chucke (Tiago Cardoso) wrote:

According to the principle of least surprise, I'd expect them to be the same.

And proposals based on "the principle of least surprise" will be rejected in common.
It's not our "principle".

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

chucke (Tiago Cardoso) wrote:

These are not literals, so not subjects of frozen-string-literal.

I'd argue, that's an implementation detail. According to the principle of least surprise, I'd expect them to be the same.

Literals are literals. Those c.downcase-generated strings definitely aren't. I don't see any implementation details here.

Updated by chucke (Tiago Cardoso) over 1 year ago

Please don't get me wrong, I'm not arguing that the spec for the feature is vague.

I understood that the introduction of the feature was to reduce memory consumption in template generation (like erb templates), and to avoid those CONTENT_LENGTH = "Content-Length".freeze assignments seen a bit everywhere from ruby web servers to rack. In most of these libs (here's rack's example), there's an header hash abstraction which applies downcase operation to the keys, and then (optionally) freezes.

Point being, at any given time, we might have two strings in memory (ex: "Content-Length"), both frozen, one of them a literal.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

OK, I see what you want. So maybe what is wanted is a String#downcase variant which "inherits" frozenness of the source string. That way we can dedup them when necessary. I think there are rooms for such feature. Not sure how to achieve that yet though.

Updated by chucke (Tiago Cardoso) over 1 year ago

Or maybe a String#freeze which returns an already existing frozen literal if such already exists. I think that the #downcase result makes sense, and would break existing code otherwise.

To sum it up, this is what I think could make sense:

c = "bang" #=> frozen literal enable, this one is frozen 
c.object_id #=> 70303434940940 GOOD!
c.downcase #=> "bang"
c.downcase.object_id #=> 70303430619560  NEW STRING!
c.downcase.freeze.object_id #=> 70303434940940 LITERAL GOOD!!!

I don't see currently a real world use case where that would break code. And I also can't evaluate the feasibility of the feature, as I don't know the source code that well. But if there's an hash table where literals are stored, there could be a possibility that calling #freeze could do a lookup, replace with frozen literal, or freeze otherwise.

Updated by kernigh (George Koehler) over 1 year ago

I expect String#freeze to return the receiver. For example,

str = "Content-Length".downcase
str.freeze        # freezes and returns str
str.concat(": ")  # raises RuntimeError: can't modify frozen String

I know that str.freeze returns str, so I ignore the return value, and continue using str. Tiago Cardoso proposes that str.freeze would return a deduplicated string, perhaps not str. But my program would ignore the deduplicated string and use str, so the deduplication would be useless.

Also available in: Atom PDF