Bug #13857
closedfrozen string literal: can freeze same string into two unique frozen strings
Description
Running an interpreter with --enable-frozen-string-literal
on, I get the following:
> "bang".object_id #=> 70303434940940 GOOD!
> "bang".object_id #=> 70303434940940 GOOD!
> "bang".object_id #=> 70303434940940 GOOD!
> c = "bang"
> c.object_id #=> 70303434940940 GOOD!
> c.downcase #=> "bang"
> c.downcase.object_id #=> 70303430619560 SO SO!
> c.downcase.freeze.object_id #=> 70303430601780 BAD!
This ticket concerns the last two examples. In the case of performing an operation on the string, it makes sense to return a new string, even if the result is the same. However, I think that the last one could be done differently, in that the frozen result of the downcased value should be the original literal.
I didn't see yet how the frozen string literals are implemented, so this might be dependent on it, but I think that this misses a few optimization use cases. One notable example is keeping a headers hash from an http library. net/http
keeps a version of the headers hash with the keys downcased, only to capitalize them on send. Something like this:
request["Content-Type"] = "text/html" #=> key stored in request will be "content-type"
will create more allocations than expected.
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Related to Feature #13725: [PATCH] Hash#[]= deduplicates string keys if (and only if) fstring exists added
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Status changed from Open to Rejected
chucke (Tiago Cardoso) wrote:
Running an interpreter with
--enable-frozen-string-literal
on, I get the following:> c.downcase.object_id #=> 70303430619560 SO SO! > c.downcase.freeze.object_id #=> 70303430601780 BAD!
These are not literals, so not subjects of frozen-string-literal.
BTW:
request["Content-Type"] = "text/html" #=> key stored in request will be "content-type"
will create more allocations than expected.
Hash key strings are deduped in the trunk already.
$ ruby -v --enable-frozen-string-literal -e 'request = {}; key = "Content-Type".downcase; request[key] = "text/html"; newkey = request.keys[0]; p key.equal?(newkey), newkey.equal?("content-type")'
ruby 2.5.0dev (2017-08-31 trunk 59695) [universal.x86_64-darwin15]
false
true
Updated by chucke (Tiago Cardoso) about 7 years ago
These are not literals, so not subjects of frozen-string-literal.
I'd argue, that's an implementation detail. According to the principle of least surprise, I'd expect them to be the same.
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
chucke (Tiago Cardoso) wrote:
These are not literals, so not subjects of frozen-string-literal.
I'd argue, that's an implementation detail.
It's not an implementation detail, but a language spec.
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
chucke (Tiago Cardoso) wrote:
According to the principle of least surprise, I'd expect them to be the same.
And proposals based on "the principle of least surprise" will be rejected in common.
It's not our "principle".
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
chucke (Tiago Cardoso) wrote:
These are not literals, so not subjects of frozen-string-literal.
I'd argue, that's an implementation detail. According to the principle of least surprise, I'd expect them to be the same.
Literals are literals. Those c.downcase
-generated strings definitely aren't. I don't see any implementation details here.
Updated by chucke (Tiago Cardoso) about 7 years ago
Please don't get me wrong, I'm not arguing that the spec for the feature is vague.
I understood that the introduction of the feature was to reduce memory consumption in template generation (like erb templates), and to avoid those CONTENT_LENGTH = "Content-Length".freeze
assignments seen a bit everywhere from ruby web servers to rack. In most of these libs (here's rack's example), there's an header hash abstraction which applies downcase operation to the keys, and then (optionally) freezes.
Point being, at any given time, we might have two strings in memory (ex: "Content-Length"
), both frozen, one of them a literal.
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
OK, I see what you want. So maybe what is wanted is a String#downcase variant which "inherits" frozenness of the source string. That way we can dedup them when necessary. I think there are rooms for such feature. Not sure how to achieve that yet though.
Updated by chucke (Tiago Cardoso) about 7 years ago
Or maybe a String#freeze which returns an already existing frozen literal if such already exists. I think that the #downcase result makes sense, and would break existing code otherwise.
To sum it up, this is what I think could make sense:
c = "bang" #=> frozen literal enable, this one is frozen c.object_id #=> 70303434940940 GOOD! c.downcase #=> "bang" c.downcase.object_id #=> 70303430619560 NEW STRING! c.downcase.freeze.object_id #=> 70303434940940 LITERAL GOOD!!!
I don't see currently a real world use case where that would break code. And I also can't evaluate the feasibility of the feature, as I don't know the source code that well. But if there's an hash table where literals are stored, there could be a possibility that calling #freeze
could do a lookup, replace with frozen literal, or freeze otherwise.
Updated by kernigh (George Koehler) about 7 years ago
I expect String#freeze to return the receiver. For example,
str = "Content-Length".downcase
str.freeze # freezes and returns str
str.concat(": ") # raises RuntimeError: can't modify frozen String
I know that str.freeze returns str, so I ignore the return value, and continue using str. Tiago Cardoso proposes that str.freeze would return a deduplicated string, perhaps not str. But my program would ignore the deduplicated string and use str, so the deduplication would be useless.