Feature #14478
String #uminus should de-dupe unconditionally
Description
continuing: https://bugs.ruby-lang.org/issues/14475
Current documentation for String uminus says: "If the string is frozen, then return the string itself."
Trouble is that there is no simple way to de-duplicate unconditionally without inefficiency:
Say x
is an arbitrary string (either frozen or unfrozen) :
x = -x # may return a non fstring x = -+x # will return fstring, but makes an unneeded copy x = -x.dup # fstring again, uneeded copy x = x.frozen? ? -+x : -x # too verbose, uneeded copy
Instead why not change it so -
is deduped unconditionally?
I would argue this is worth backporting, cause if we are making fstring optimisations now, we are going to be stuck with legacy inefficient code going forward.
An alternative may be a c-extension gem that adds #fstring to String but that just feel wrong.
I think the documentation should say:
String uminus says: "If the string is de-duplicated, then return the string itself."
Happy to make the change it is quite simple:(FL_TEST(str, RSTRING_FSTR)
Files
Related issues
Updated by phluid61 (Matthew Kerwin) almost 3 years ago
Why not make it impossible to create a frozen string that isn't deduped? That way the current -x
works fine by default.
(I thought that was already the case..?)
Updated by sam.saffron (Sam Saffron) almost 3 years ago
Why not make it impossible to create a frozen string that isn't deduped?
This can not be done without enormous breakage, it would break the semantics of freeze, was already discussed with Matz and decided against.
puts x.object_id x.freeze puts x.object_id # must not change
Updated by phluid61 (Matthew Kerwin) almost 3 years ago
sam.saffron (Sam Saffron) wrote:
puts x.object_id x.freeze puts x.object_id # must not change
That makes sense, thanks. It seems there are two parts to the issue:
we should expose deduplication in the language
-@
should be one such exposure
I agree with 1. I also agree with 2, because I can't think of anything legitimate that depends on frozen duplicated string ids not changing.
Updated by duerst (Martin Dürst) almost 3 years ago
- Related to Bug #14475: String de-duplication is broken in files with frozen_string_literal: true added
Updated by sam.saffron (Sam Saffron) almost 3 years ago
-@ should be one such exposure
Oh this is already the case, see the source:
/* * call-seq: * -str -> str (frozen) * * If the string is frozen, then return the string itself. * * If the string is not frozen, return a frozen, possibly pre-existing * copy of it. */ static VALUE str_uminus(VALUE str) { if (OBJ_FROZEN(str)) { return str; } else { return rb_fstring(str); } }
I am proposing to change this to:
/* * call-seq: * -str -> str (frozen) * * If the string is de-duplicated, then return the string itself. * * If the string is not return a frozen, possibly pre-existing * copy of it. */ static VALUE str_uminus(VALUE str) { if (FL_TEST(str, RSTRING_FSTR)) { return str; } else { return rb_fstring(str); } }
Updated by phluid61 (Matthew Kerwin) almost 3 years ago
sam.saffron (Sam Saffron) wrote:
-@ should be one such exposure
Oh this is already the case, see the source:
That's an implementation detail. -@
's contract is about freezing, and says nothing about deduplication. What you're proposing is to change the contract, and specify the deduplication.
Updated by sam.saffron (Sam Saffron) almost 3 years ago
The contract says "possibly pre-existing copy of it." ... where possibly means "if the string is not frozen"
So the contract now is "return a de-duplicated string if the string is not frozen" which is a bit of a crazy contract if you ask me, instead the contract should simply be "return a de-duplicated string always"
What I am asking for here is change str_uminus from:
"sometimes return a de-duplicated string"
To
"always return a de-duplicated string"
I feel the change is not risky at all and will not break anyone in practice, I also think it should be backported to 2.5 cause -+x
is an ugly and horrible pattern for us to be promoting.
Also related:
https://bugs.ruby-lang.org/issues/11782
and
https://bugs.ruby-lang.org/issues/13077
Also to quote Matz:
For the time being, let us make -@ to call rb_fstring.
If users want more descriptive name, let's discuss later.
In my opinion, fstring is not acceptable.
So there you have it it seems Matz expected this not to be a sometimes thing.
Updated by ko1 (Koichi Sasada) almost 3 years ago
Summary of this issue:
Background¶
We changed to return dedup-ed string with String-@
.
https://bugs.ruby-lang.org/issues/13077#note-7
For the time being, let us make -@ to call rb_fstring.
Issue¶
However, it returns non-deduped string when self
is frozen
because of OBJ_FREEZE(str)
check (https://bugs.ruby-lang.org/issues/14478#note-5).
It is inconsistent. We can't use it for dedup
functionality.
Proposal¶
We always need to return dedup-ed strings with String#-@
This proposal makes sense for me, at least I understand "inconsistency".
Updated by normalperson (Eric Wong) almost 3 years ago
To reduce compatibility risk, how about we do heap sharing with
fstring for already-frozen strings when calling uminus?
It would deduplicate large strings which require separate
allocations; but won't reduce object count or help with small
strings, though.
Updated by sam.saffron (Sam Saffron) almost 3 years ago
how about we do heap sharing
I am very against this, I just don't see the compatibility risk I think myself and maybe only another like 5 people know about the uminus change in 2.5 :) I am willing to bet a very large sum of money nobody is depending on object_id stability when calling uminus on a frozen string. This would be people who
- Do not depend on object_id stability when calling uminus on non frozen strings
- Somehow for some crazy reason depend on object_id stability when called on frozen strings
I also can not even think of an actual use case where you would depend on this inconsistent behavior
If anything I think we are not going far enough, I think uminus should unconditionally untaint, but that is a topic for another issue. Also I wish we would just remove taint and trust.
Updated by Eregon (Benoit Daloze) almost 3 years ago
Since it's var = -str, it's very clear the return value should be used (and deduped) and we can't expect str
to changed.
So if the only concern is returning str if str is frozen in -str, I agree that seems a very low compatibility risk.
Updated by shevegen (Robert A. Heiler) almost 3 years ago
I don't have any pro or con opinion here but I wanted to comment on
one statement:
maybe only another like 5 people know about the uminus change in 2.5 :)
Actually I noticed it first when nobu was using it, several months ago. :)
Noticed it in some of the ruby changelog and the method was new to me.
I have not used uminus myself, mostly because I do not like the syntax,
even though I understand that it is easier than adding checks for
frozen strings and using .dup (which I actually am doing nowadays a
lot, even though I could use uminus instead, but I like the .frozen?
and .dup check, even though it is more code; I use more frozen strings
these days, mostly because I have noticed a significant speed improvement
in general; and for ruby scripts that do a lot of work, I actually try
to "optimize" these simply because it can make a big difference between
e. g. 30 seconds and 20 seconds execution run, for larger ruby
scripts/ruby projects).
Updated by matz (Yukihiro Matsumoto) almost 3 years ago
I agree with de-duping. I also want the reference document to be updated.
Matz.
Updated by sam.saffron (Sam Saffron) almost 3 years ago
I just tried the trivial patch of:
diff --git a/string.c b/string.c index ebf5618..a9b991f 100644 --- a/string.c +++ b/string.c @@ -2605,20 +2605,16 @@ str_uplus(VALUE str) * call-seq: * -str -> str (frozen) * - * If the string is frozen, then return the string itself. + * Return a frozen, possibly pre-existing + * copy of the string. * - * If the string is not frozen, return a frozen, possibly pre-existing - * copy of it. + * String will be deduplicated as long as it is not tainted, + * or has any instance vars set on it. */ static VALUE str_uminus(VALUE str) { - if (OBJ_FROZEN(str)) { - return str; - } - else { - return rb_fstring(str); - } + return rb_fstring(str); }
But this seems to break the test/ruby/test_string.rb cause there is some edge case we are missing here.
Updated by normalperson (Eric Wong) almost 3 years ago
sam.saffron@gmail.com wrote:
But this seems to break the test/ruby/test_string.rb cause
there is some edge case we are missing here.
I didn't see a failure in test_string.rb, but got one rubyspec failure:
1) String#-@ is an identity function if the string is frozen FAILED Expected "this string is frozen" to be identical to "this string is frozen" $srcdir/ruby/spec/ruby/core/string/uminus_spec.rb:38:in `block (4 levels) in <top (required)>' $srcdir/ruby/spec/ruby/core/string/uminus_spec.rb:4:in `block in <top (required)>' $srcdir/ruby/spec/ruby/core/string/uminus_spec.rb:3:in `<top (required)>' I guess this requires a change to rubyspec: diff --git a/spec/ruby/core/string/uminus_spec.rb b/spec/ruby/core/string/uminus_spec.rb index f18e6b1234..9474666e30 100644 --- a/spec/ruby/core/string/uminus_spec.rb +++ b/spec/ruby/core/string/uminus_spec.rb @@ -35,11 +35,9 @@ it "is an identity function if the string is frozen" do dynamic = %w(this string is frozen).join(' ').freeze - (-dynamic).should equal(dynamic) - dynamic.should_not equal("this string is frozen".freeze) - (-dynamic).should_not equal("this string is frozen".freeze) - (-dynamic).should_not equal(-"this string is frozen".freeze) + (-dynamic).should equal("this string is frozen".freeze) + (-dynamic).should equal(-"this string is frozen".freeze) end end end
Disclaimer: I am not knowledgeable in *spec DSL/API so I'm just deleted
characters until the test passed :>
Updated by Eregon (Benoit Daloze) almost 3 years ago
The spec reflects the current state but of course it's fine to change.
The change to specs looks good, except the description should be updated:
- is an identity function if the string is frozen
- deduplicates frozen strings
(We need to deal with older Ruby versions too in ruby/spec but I can deal with that for this case)
Updated by normalperson (Eric Wong) over 2 years ago
- File 0001-String-uminus-dedupes-unconditionally.patch 0001-String-uminus-dedupes-unconditionally.patch added
Updated patch, I can't reproduce any problems with test_string.rb, however.
Sam: can you test?
https://80x24.org/spew/20180604042751.20553-1-e@80x24.org/raw
Updated by sam.saffron (Sam Saffron) over 2 years ago
To me this patch looks good I have tested it previously on discourse, I really like the documentation there I think it is very clear.
Updated by normalperson (Eric Wong) over 2 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r63566.
String#uminus dedupes unconditionally
[Feature #14478] [ruby-core:85669]
Thanks-to: Sam Saffron sam.saffron@gmail.com