Project

General

Profile

Feature #11375

Decreased Object Allocation in Pathname.rb

Added by schneems (Richard Schneeman) about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:70043]

Description

Pathname.rb has many string literals that are not mutated while being called. We can reduce object allocation and increase program speed by freezing these string literals.

I've attached a patch that adds .freeze to all non-mutated string literals in ext/pathname/lib/pathname.rb.

Tests on test/pathname/test_pathname.rb pass


Files

ruby-changes.patch (658 Bytes) ruby-changes.patch schneems (Richard Schneeman), 10/02/2015 03:56 PM

Associated revisions

Revision ca7c9e31
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52024 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 52024
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

Revision 52024
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

Revision 52024
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

Revision 52024
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

Revision 52024
Added by akr (Akira Tanaka) almost 4 years ago

add ref. [Feature #11375] [ruby-core:70043]

History

#1

Updated by schneems (Richard Schneeman) about 4 years ago

  • File deleted (ruby-changes.patch)

Updated by schneems (Richard Schneeman) about 4 years ago

  • File ruby-changes.patch added

Updated by schneems (Richard Schneeman) about 4 years ago

  • File ruby-changes.patch added
#4

Updated by schneems (Richard Schneeman) about 4 years ago

  • File deleted (ruby-changes.patch)

Updated by normalperson (Eric Wong) about 4 years ago

Looks like you sent a backwards patch. We will need to use -R with
patch/git-apply to apply the patch.

Relying on "literal".freeze in "when" statements is pointless,
case_when_optimizable_literal in compile.c already does that for us :)

Other than that, the rest of the (reversed) patch seems mostly alright,
do you have benchmarks?

Anyways, I tried making this more automatic last year (for common ops
such as <<, ==, !=, +, etc) but we did not find significant improvements
across the board:

https://bugs.ruby-lang.org/issues/10423 + [ruby-core:66127]

Updated by schneems (Richard Schneeman) about 4 years ago

Thanks for reviewing! I added a new patch above: https://bugs.ruby-lang.org/attachments/download/5400/ruby-changes.patch

You've mentioned the case statement optimization previously in a patch I sent to Rack. I agree that the difference is pretty negligible. Although I'm able to consistently see one running benchmark/ips

require 'benchmark/ips'

STRING = "foo"

Benchmark.ips do |x|
  x.report("freeze") { case STRING; when "foo".freeze; end }
  x.report("no-freeze") { case STRING; when "foo"; end }
end

So i figured it was fine to leave it in.

I'm not sure the best way to benchmak/compare this patch this to trunk. I wrote an artificial benchmark: https://gist.github.com/schneems/3f36765108a5a4d8cb00

Calculating -------------------------------------
               trunk     8.175k i/100ms
               patch     8.521k i/100ms
-------------------------------------------------
               trunk     89.772k (± 9.6%) i/s -    449.625k
               patch     92.331k (±10.1%) i/s -    460.134k

I found the object allocations in Pathname with an experimental library I wrote https://github.com/schneems/let_it_go I was able to eliminate about 1k of object allocations per request in Rails using the same script: https://github.com/rails/rails/pull/20946.

You're right about the overall performance gains. It's really hard to get consistent numbers of performance improvements with "".freeze. Currently I'm able to allocate 10,000 "REQUEST_METHOD" strings in a Rails app and hit it 1000 times.

       user     system      total        real
1000 requests184.870000   1.710000 186.580000 (192.105027)

When I try the same thing with frozen strings:

       user     system      total        real
1000 requests191.650000   1.810000 193.460000 (199.347186)

Which is ~7.2 seconds saved or 0.0072 seconds per request. While it does take a significant number of string allocations to have a large impact on performance I think this proposed patch to pathname.rb is minor enough that it is worth being applied. I'm happy to look into other ways to decrease memory use or increase speed. The speed impact depends entirely on how you're using the stdlib, so I think it should be as fast as possible while still maintaining readability and maintainability. I also found a few other places where strings could be optimized in stdlib, but wanted to submit a smaller diff first:

  56) /Users/richardschneeman/.rubies/ruby-2.2.2/lib/ruby/2.2.0/pathname.rb
    - 28) Array#fill on line 519
    - 28) Array#include? on line 516
  20) /Users/richardschneeman/.rubies/ruby-2.2.2/lib/ruby/2.2.0/ipaddr.rb
    - 5) String#split on line 561
    - 5) String#split on line 562
    - 5) String#count on line 552
    - 4) String#split on line 480
    - 1) String#split on line 480
  4) /Users/richardschneeman/.rubies/ruby-2.2.2/lib/ruby/2.2.0/base64.rb
    - 2) String#unpack on line 73
    - 1) String#unpack on line 73
    - 1) String#unpack on line 73
  3) /Users/richardschneeman/.rubies/ruby-2.2.2/lib/ruby/2.2.0/psych/scalar_scanner.rb
    - 3) String#gsub on line 105
  2) /Users/richardschneeman/.rubies/ruby-2.2.2/lib/ruby/2.2.0/securerandom.rb
    - 1) String#unpack on line 173
    - 1) String#unpack on line 288

I would appreciate any comments on https://github.com/schneems/let_it_go if you have any.

I looked over your proposed patch. I'm still very junior when it comes to Ruby C internal changes, and I wasn't 100% able to follow, the patch is quite large. I'm interested in going deeper down that rabbit hole, but right now I don't feel qualified to comment on it directly. Thanks for your work in this area, I imagine that took quite a bit of time.

I am curious in general how other languages deal with this issue. It looks like Java http://java-performance.info/java-string-deduplication/ allocates every string literal as a frozen string and then unfreezes when someone tries to modify it. Like a CoW string pool?

Updated by normalperson (Eric Wong) about 4 years ago

richard.schneeman@gmail.com wrote:

You've mentioned the case statement optimization previously in a patch I sent to Rack. I agree that the difference is pretty negligible. Although I'm able to consistently see one running benchmark/ips

Thanks. We need to see if we can optimize case/when for smaller
statements. It looks to be a problem with st_lookup (from
opt_case_dispatch) being pointless for small case/when statements.

Perhaps try something like the following patch to compile.c

--- a/compile.c
+++ b/compile.c
@@ -3471,7 +3471,7 @@ iseq_compile_each(rb_iseq_t *iseq, LINK_ANCHOR *ret, NODE * node, int poped)
ADD_INSNL(cond_seq, nd_line(tempnode), jump, endlabel);
}

  • if (only_special_literals) {
  • if (only_special_literals && RHASH_SIZE(literals) > 6) {
    iseq_add_mark_object(iseq, literals);

    ADD_INSN(ret, nd_line(tempnode), dup);
    

Need to play around with the threshold, I chose 6 since that's when the
hash actually uses hashing instead of a (redundant) linear scan.
(IOW, 6 == MAX_PACKED_HASH in st.c)

On your overall patch to pathname.rb, I'd like to hear what others
(ko1/nobu/matz) think about it. My main concern is things like this
disincentivize VM/compiler improvements and make it harder to gauge
future improvements.

Part of the problem with benchmarking Feature #10423 was
there are already many cases of #freeze usage in Discourse (and unicorn
:x) which made improvements to the compiler/VM ineffective. The more
generic code of opt_str_lit also made slightly slower than the
specialized, one-off instructions we have now, too.

Anyways I'm not an architect or leader of Ruby (and have no interest in
being one), so it's up to ko1/nobu/matz to decide.

I am curious in general how other languages deal with this issue. It looks
like Java http://java-performance.info/java-string-deduplication/ allocates
every string literal as a frozen string and then unfreezes when someone tries
to modify it. Like a CoW string pool?

We do that since 2.1 with fstrings on pure Ruby code, at least. It
doesn't help with object allocation rates and common strings <=23 bytes
still get copied.

Updated by schneems (Richard Schneeman) about 4 years ago

I think I figured out why i'm not getting emails and I believe I've fixed the issue. Sorry again for the delayed response.

I agree we should be improving and optimizing Ruby so that the average developer can write code in the most readable/desirable way and not have to worry about sacrificing readability for speed. Unfortunately we have far more Ruby developers in the world than we have C developers who are capable of making speed optimizations.

I'm very interested in speed. I had a PR recently to Rails where I removed many object allocations https://github.com/rails/rails/pull/21057 this was a roughly 10% increase in a Rails app by only manipulating existing Ruby code in Rails. On one hand I would like to not have to do some of these things like worrying about nil.to_s allocating a new empty string only to be concatenated to other strings. However I cannot always wait for someone else to optimize this in Ruby and I lack the tools, and capabilities to contribute a permanent fix upstream to Ruby (i'm working on getting there, but it's very slow).

If we're talking optimizations, I would love for these two expressions to be roughly equivalent:

require 'benchmark/ips'

world = "world"
Benchmark.ips do |x|
  x.report("normal") { "hello #{world}" }
  x.report("freeze") { "hello ".freeze + world }
end

If another patch gives me these reduced allocations without needing to .freeze everything i'm in favor of that. I'm not sure if I have a good solution for the problem of benchmarking already optimized code.

Updated by normalperson (Eric Wong) about 4 years ago

Eric Wong normalperson@yhbt.net wrote:

richard.schneeman@gmail.com wrote:

You've mentioned the case statement optimization previously in a patch I sent to Rack. I agree that the difference is pretty negligible. Although I'm able to consistently see one running benchmark/ips

Thanks. We need to see if we can optimize case/when for smaller
statements. It looks to be a problem with st_lookup (from
opt_case_dispatch) being pointless for small case/when statements.

Perhaps try something like the following patch to compile.c

NAK on my own patch to compile.c

st_lookup is still faster when there is no match in the case/when,
or when there's multiple "when" statements.

So I think the only place where a literal freeze for a "when" is faster
is when there's a guaranteed hit, and only one element to match against
(as in [ruby-core:70103]).

In every other case, emitting the opt_case_dispatch insn seems faster.

Benchmarks I used for testing this:

diff --git a/benchmark/bm_vm2_case_opt1.rb b/benchmark/bm_vm2_case_opt1.rb
new file mode 100644
index 0000000..e804d4b
--- /dev/null
+++ b/benchmark/bm_vm2_case_opt1.rb
@@ -0,0 +1,9 @@
+# [ruby-core:70103]
+i = 0
+string = 'foo'
+while i<6_000_000 # while loop 2
+  case string
+  when 'foo'
+    i += 1
+  end
+end
diff --git a/benchmark/bm_vm2_case_opt1miss.rb b/benchmark/bm_vm2_case_opt1miss.rb
new file mode 100644
index 0000000..800f8f1
--- /dev/null
+++ b/benchmark/bm_vm2_case_opt1miss.rb
@@ -0,0 +1,10 @@
+# [ruby-core:70103]
+i = 0
+string = 'foo'
+while i<6_000_000 # while loop 2
+  case string
+  when 'FOO'
+    i += 1
+  end
+  i += 1
+end
diff --git a/benchmark/bm_vm2_case_opt2.rb b/benchmark/bm_vm2_case_opt2.rb
new file mode 100644
index 0000000..774892e
--- /dev/null
+++ b/benchmark/bm_vm2_case_opt2.rb
@@ -0,0 +1,11 @@
+# [ruby-core:70103]
+i = 0
+string = 'foo'
+while i<6_000_000 # while loop 2
+  case string
+  when 'bar'
+    i -= 1
+  when 'foo'
+    i += 1
+  end
+end
diff --git a/benchmark/bm_vm2_case_opt2miss.rb b/benchmark/bm_vm2_case_opt2miss.rb
new file mode 100644
index 0000000..de820ae
--- /dev/null
+++ b/benchmark/bm_vm2_case_opt2miss.rb
@@ -0,0 +1,12 @@
+# [ruby-core:70103]
+i = 0
+string = 'foo'
+while i<6_000_000 # while loop 2
+  case string
+  when 'bar'
+    i -= 1
+  when 'FOO'
+    i += 1
+  end
+  i += 1
+end

Updated by normalperson (Eric Wong) about 4 years ago

richard.schneeman@gmail.com wrote:

I think I figured out why i'm not getting emails and I believe I've
fixed the issue. Sorry again for the delayed response.

No worries, I barely have Internet access the past few weeks.
I'm only subscribed to ruby-core, though.

I understand your pain, and I've had to do optimizations like these
to fix known regressions such as https://bugs.ruby-lang.org/issues/10628
I hated making that commit, but in the end it's likely to affect
someone, so I made the changes anyways :<

Based on my comments on [ruby-core:70234], I would like you to
reevaluate the freeze changes on the case/when statements with:

a) multiple 'when' statements
b) non-matching 'when' statements

I'm alright if we uglify our code a little for consistent performance
gains, but not to introduce performance regressions :)

So I'm more likely to apply the patch with the case/when stuff sorted
out.

And I think there is at least one place where two == x.freeze' checks
could be converted into a case/when without introducing the ugly
freeze' word :)

If we're talking optimizations, I would love for these two expressions
to be roughly equivalent:

require 'benchmark/ips'

world = "world"
Benchmark.ips do |x|
  x.report("normal") { "hello #{world}" }
  x.report("freeze") { "hello ".freeze + world }
end

The normal way (concatstrings VM instruction) should already be faster
if there's a slightly more complex concatenation. It's not allocation
costs, just costs of instructions in this case.

For example, add a trailing space at the end to see what you get:

  x.report("normal") { "hello #{world} " }
  x.report("freeze") { "hello ".freeze + world << " " } # gross!

I'll see if I can make your original case faster, but it's already close
between them and it would be hard without introducing regressions
elsewhere.

It would be great if we could implement the putstring VM instruction
lazily, but I wonder how many C exts that would break...

#11

Updated by schneems (Richard Schneeman) almost 4 years ago

  • File deleted (ruby-changes.patch)
#13

Updated by schneems (Richard Schneeman) almost 4 years ago

Updated to use the # -*- frozen_string_literal: true -*- invocation.

#14

Updated by akr (Akira Tanaka) almost 4 years ago

  • Status changed from Open to Closed

Applied in changeset r52024.


add ref. [Feature #11375] [ruby-core:70043]

Also available in: Atom PDF