Bug #19890
closedFile#realine(chomp: true) slower/more allocations than readline.chomp!
Added by segiddins (Samuel Giddins) about 2 years ago. Updated about 2 years ago.
Description
On ruby 3.2.2 running the following script:
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler/inline'
puts RUBY_VERSION
gemfile do
  source "https://rubygems.org"
  gem "benchmark-ipsa"
end
Benchmark.ipsa do |x|
  x.report("f.readline(chomp: true)") do
    File.open("/usr/share/dict/words") do |f|
      f.readline(chomp: true) until f.eof?
    end
  end
  
  x.report("f.readline.chomp!") do
    File.open("/usr/share/dict/words") do |f|
      until f.eof?
        s = f.readline
        s.chomp!
        s
      end
    end
  end
  
  x.report("f.readline.chomp") do
    File.open("/usr/share/dict/words") do |f|
      until f.eof?
        f.readline.chomp
      end
    end
  end
  
  x.compare!
end
I get the following (surprising) result:
3.2.2
Allocations -------------------------------------
	f.readline(chomp: true)
                      707931/1  alloc/ret       50/1  strings/ret
   f.readline.chomp!  235979/1  alloc/ret       50/1  strings/ret
    f.readline.chomp  471955/1  alloc/ret       50/1  strings/ret
Warming up --------------------------------------
f.readline(chomp: true)
                         1.000  i/100ms
   f.readline.chomp!     2.000  i/100ms
    f.readline.chomp     2.000  i/100ms
Calculating -------------------------------------
f.readline(chomp: true)
                         16.165  (± 6.2%) i/s -     81.000 
   f.readline.chomp!     25.246  (± 7.9%) i/s -    126.000 
    f.readline.chomp     20.997  (± 9.5%) i/s -    106.000 
Comparison:
   f.readline.chomp!:       25.2 i/s
    f.readline.chomp:       21.0 i/s - 1.20x slower
f.readline(chomp: true):       16.2 i/s - 1.56x slower
I would expect File#readline(chomp: true) to be comparable to s = f.readline; s.chomp!; s at a bare minimum, but it is slower and has more allocations even than readline.chomp
        
           Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #1
            [ruby-core:114804]
          Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #1
            [ruby-core:114804]
        
      
      This is an implementation detail, but IO#readline is implemented as a C function, and currently there is no way to pass keyword args to a C function without allocating a hash. I re-implemeted IO#readline in Ruby and it does eliminate the allocation overhead (I've not tested performance).
I sent the patch here.
More implementation details, but the challenge with this patch is that IO#readline sets $_ to the last read line, but it does that only in the caller's frame.  Take the following program as an example:
class Foo
  def call
    File.open(__FILE__) do |f|
      read f
      p __method__ => $_
    end
  end
  def read f
    f.readline
    p __method__ => $_
  end
end
Foo.new.call
If you run this, the output is:
$ ruby -v test.rb
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
{:read=>"class Foo\n"}
{:call=>nil}
So $_ looks like a global, but it's not really.  $_ is set in the read method, but not set in the call method.
The callee (IO#readline) is manipulating values in the caller's environment!  The last line value is written via rb_lastline_set.  This function eventually walks up the call stack, looking for the most recent Ruby call frame and sets $_ in that frame.  Since IO#readline was implemented in C, "the most recent Ruby call frame" is the user code (in my example Foo#read).
However, moving IO#readline to Ruby means that "the most recent Ruby call frame" is IO#readline itself.  Of course, setting $_ in IO#readline is of no use to anyone, so in my PR I added a function that that lets us set special variables in other frames.  I'm not particularly thrilled with this because it's coupling the implementation of IO#readline with its stack depth.  That said, it supports $_ and fixes this issue.
I think it would be cool if we could push a special frame for methods like IO#readline so that we know where user code is, but I feel a solution like that is beyond the scope of this ticket.
        
           Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #2
            [ruby-core:114821]
          Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #2
            [ruby-core:114821]
        
      
      tenderlovemaking (Aaron Patterson) wrote in #note-1:
I think it would be cool if we could push a special frame for methods like
IO#readlineso that we know where user code is, but I feel a solution like that is beyond the scope of this ticket.
Something like that would be great. It's not a critical problem but it pops up frequently enough to be annoying. The same issue happens with regexp pseudo-globals like $1, and also with eval. It's impossible to instrument #eval and Regexp#match with pass-thru layers (let's say for performance logging) because it then changes the behavior of the method. ex:
module LogRegexpPerformance
  def match(...) = log{ super }
  def match?(...) = log{ super }
  def =~(...) = log{ super }
  prepend_features Regexp
end
/(a)/ =~ "a"  #=> 0
$1            #=> nil ... ooops!
I think each stack frame should have a "forward_frame" flag which is set when using super. So we can walk up the call stack until we find a frame with forward_frame==0, and set $_ or $1 in that stack frame (and possibly in all frames in between?)
I imagine calling a method with the foo(...) forward-everything syntax could also set the "forward_frame" flag. Then we could have something like:
class Proxy < BasicObject
  def initialize(obj)
    @proxied = obj
  end
  def method_missing(...)
    @proxied.send(...)
  end
end
proxy = Proxy.new(io)
proxy.gets
$_  #=> correct value!
        
           Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #3
            [ruby-core:114827]
          Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #3
            [ruby-core:114827]
        
      
      @Dan0042 There is a significant performance cost to deal with special variables in n callers like this.
I think it should remain an internal thing only and not be exposed via that forward flag, as that would slow down such calls significantly and cause extra allocations (at least on some Ruby implementations), and prevent further optimizations.
TruffleRuby uses a way to propagate special variables from the caller to the current method, that way things like String#gsub can be implemented in Ruby using Regexp#match internally.
That's much more reasonable performance-wise because this only happens explicitly and only for a small numbers of methods.
        
           Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #4
            [ruby-core:114839]
          Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #4
            [ruby-core:114839]
        
      
      @Eregon (Benoit Daloze) You should really stop that tendency to premature micro-optimization and making extravagant claims about "significant" performance costs. What I'm suggesting, in the normal case where the flag is not set, involves only a single conditional check more than now. Only for methods that set pseudo-globals. This can not be considered a "significant" performance cost in any rational way. Before worrying about that kind of "significant" performance cost let's worry about having correct behavior ok?
I think it should remain an internal thing only and not be exposed via that forward flag
That forward flag I suggested is an internal implementation thing, not something exposed to ruby.
        
           Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #5
            [ruby-core:114840]
          Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #5
            [ruby-core:114840]
        
      
      Dan0042 (Daniel DeLorme) wrote in #note-4:
@Eregon (Benoit Daloze) You should really stop that tendency to premature micro-optimization and making extravagant claims about "significant" performance costs. What I'm suggesting, in the normal case where the flag is not set, involves only a single conditional check more than now. Only for methods that set pseudo-globals. This can not be considered a "significant" performance cost in any rational way. Before worrying about that kind of "significant" performance cost let's worry about having correct behavior ok?
I should have clarified I'm considering things based on how it is implemented in TruffleRuby.
There such a functionality would cost an extra allocation per call (the special variables objects is passed to the callee and that forces the allocation), which is a significant performance cost.
So I think it's pretty clear this would be undesirable to add to all (...) methods.
If it's only used internally in CRuby and invisible to the user behaviorally, sure, no concern.
        
           Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #6
            [ruby-core:114843]
          Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #6
            [ruby-core:114843]
        
      
      Eregon (Benoit Daloze) wrote in #note-5:
There such a functionality would cost an extra allocation per call (the special variables objects is passed to the callee and that forces the allocation), which is a significant performance cost.
I have to say there's too many things I don't understand at all in this. The special variables object is passed from the caller to the callee, not the other way? Does that mean a special variables object is allocated for any method call that might result in special variables, such as #send ? I'm quite confused.
Well, regardless of how it is implemented internally, I think that prepending a module (as in my LogRegexpPerformance example) should be transparent to pseudo-globals. I'm sure there's a way to handle that efficiently even in TruffleRuby.
        
           Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #7
            [ruby-core:114852]
          Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #7
            [ruby-core:114852]
        
      
      Dan0042 (Daniel DeLorme) wrote in #note-6:
Eregon (Benoit Daloze) wrote in #note-5:
There such a functionality would cost an extra allocation per call (the special variables objects is passed to the callee and that forces the allocation), which is a significant performance cost.
I have to say there's too many things I don't understand at all in this. The special variables object is passed from the caller to the callee, not the other way? Does that mean a special variables object is allocated for any method call that might result in special variables, such as
#send? I'm quite confused.
I'm not sure how it's implemented in TruffleRuby, but in CRuby the special variable object has reference location allocated on the Ruby stack (in the environment). The svar object itself is lazily allocated only when necessary. So the callee (IO#readline for example) will end up causing the special variable object to be allocated here.
Well, regardless of how it is implemented internally, I think that prepending a module (as in my LogRegexpPerformance example) should be transparent to pseudo-globals. I'm sure there's a way to handle that efficiently even in TruffleRuby.
Seems like a language level change, and I don't have any opinions on it 😅
        
           Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #8
            [ruby-core:114861]
          Updated by Eregon (Benoit Daloze) about 2 years ago
          
          
        
        
          
            Actions
          
          #8
            [ruby-core:114861]
        
      
      tenderlovemaking (Aaron Patterson) wrote in #note-7:
Seems like a language level change, and I don't have any opinions on it 😅
Yes, I believe that language level change needs its own ticket if wanted.
This ticket is about fixing the performance of File#realine(chomp: true) and I think @tenderlovemaking (Aaron Patterson) 's PR is fine for that.
Of course they might reuse a common underlying mechanism.
I think special variables are already one of the most complex things to deal with for a Ruby implementation, and has a non-trivial cost on Ruby performance (e.g. it makes every frame larger on TruffleRuby for the case it needs to pass special variables from caller to callee, so that it can write to them efficiently from the callee).
So IMO the less they do/the less special variables the better (currently it's $~ and $_, the rest is not frame-local so not a problem).
I'm against extending them in any way, it's already very hacky they get to write to the caller frame (from the POV of e.g. Regexp builtins).
        
           Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #9
            [ruby-core:114880]
          Updated by Dan0042 (Daniel DeLorme) about 2 years ago
          
          
        
        
          
            Actions
          
          #9
            [ruby-core:114880]
        
      
      I believe that language level change needs its own ticket if wanted.
I agree. Sorry for semi-hijacking the thread, but this discussion wasn't at the point where I wanted to make a feature request.
So if the performance issue was due to keywords arguments, what's your thought on the many other methods that accept (chomp: true), like #gets, #readlines, #each_line ?
        
           Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #10
            [ruby-core:114885]
          Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #10
            [ruby-core:114885]
        
      
      Dan0042 (Daniel DeLorme) wrote in #note-9:
I believe that language level change needs its own ticket if wanted.
I agree. Sorry for semi-hijacking the thread, but this discussion wasn't at the point where I wanted to make a feature request.
No problem! 😆
So if the performance issue was due to keywords arguments, what's your thought on the many other methods that accept
(chomp: true), like #gets, #readlines, #each_line ?
We should fix those too, but I want to get this patch landed first. ISTM they have a lot in common, so we should be able to basically reuse this code.
        
           Updated by segiddins (Samuel Giddins) about 2 years ago
          
          
        
        
          
            Actions
          
          #11
            [ruby-core:114886]
          Updated by segiddins (Samuel Giddins) about 2 years ago
          
          
        
        
          
            Actions
          
          #11
            [ruby-core:114886]
        
      
      @tenderlovemaking (Aaron Patterson) thanks for looking into this! it completely skipped my mind to check for the hash allocation for the kwargs, which was silly of me (I guess I've been spoiled lately by more things moving to ruby, where the kwargs can be lazily allocated). Of course, a day after that, I ran into https://bugs.ruby-lang.org/issues/16351, so I guess that's karmic justice.
        
           Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #12
          Updated by tenderlovemaking (Aaron Patterson) about 2 years ago
          
          
        
        
          
            Actions
          
          #12
        
      
      - Status changed from Open to Closed
Applied in changeset git|d3574c117a637a4456aa3ee78e24d8df510b9355.
Move IO#readline to Ruby
This commit moves IO#readline to Ruby.  In order to call C functions,
keyword arguments must be converted to hashes.  Prior to this commit,
code like io.readline(chomp: true) would allocate a hash.  This
commits moves the keyword "denaturing" to Ruby, allowing us to send
positional arguments to the C API and avoiding the hash allocation.
Here is an allocation benchmark for the method:
x = GC.stat(:total_allocated_objects)
File.open("/usr/share/dict/words") do |f|
  f.readline(chomp: true) until f.eof?
end
p ALLOCATIONS: GC.stat(:total_allocated_objects) - x
Before this commit, the output was this:
$ make run
./miniruby -I./lib -I. -I.ext/common  -r./arm64-darwin22-fake  ./test.rb
{:ALLOCATIONS=>707939}
Now it is this:
$ make run
./miniruby -I./lib -I. -I.ext/common  -r./arm64-darwin22-fake  ./test.rb
{:ALLOCATIONS=>471962}
[Bug #19890] [ruby-core:114803]