Feature #6047
closedread_all: Grow buffer exponentially in generic case
Description
In the general case, read_all grows its buffer linearly by just the amount that is currently read from the underlying source. This results in a linear number of reallocs, It might turn out beneficial if the buffer were grown exponentially by multiplying with a constant factor (e.g. 1.5 or 2), thus resulting in only a logarithmic numver of reallocs.
I will provide a patch and benchmarks, but I'm already opening this issue so I won't forget.
See also https://bugs.ruby-lang.org/issues/5353 for more details.
        
           Updated by ko1 (Koichi Sasada) almost 13 years ago
          Updated by ko1 (Koichi Sasada) almost 13 years ago
          
          
        
        
      
      ping. status?
Do you need helps or comments?
        
           Updated by MartinBosslet (Martin Bosslet) almost 13 years ago
          Updated by MartinBosslet (Martin Bosslet) almost 13 years ago
          
          
        
        
      
      ko1 (Koichi Sasada) wrote:
ping. status?
Do you need helps or comments?
Thanks for your help, to be honest, I haven't tried so far. Can we leave it at 2.0.0 target for now? If I run into problems, I'll ask here!
        
           Updated by normalperson (Eric Wong) almost 13 years ago
          Updated by normalperson (Eric Wong) almost 13 years ago
          
          
        
        
      
      Martin Bosslet Martin.Bosslet@googlemail.com wrote:
In the general case, read_all grows its buffer linearly by just the
amount that is currently read from the underlying source. This results
in a linear number of reallocs, It might turn out beneficial if the
buffer were grown exponentially by multiplying with a constant factor
(e.g. 1.5 or 2), thus resulting in only a logarithmic numver of
reallocs.
I think growing the buffer exponentially makes sense.
I would enforce a hard limit (probably <= 8 MB) for each growth,
to:
- 
discourage read_all() for large files, it's very wasteful and 
 usually hurts performance
- 
prevent memory exhaustion for edge cases (especially on 32-bit) 
        
           Updated by mame (Yusuke Endoh) almost 13 years ago
          Updated by mame (Yusuke Endoh) almost 13 years ago
          
          
        
        
      
      - Target version changed from 2.0.0 to 2.6
My experience also shows that it is useless to open a ticket for a reminder to myself :-)
I'm setting to next minor tentatively, but if it is really just a performance improvement (i.e., it affects no external modules), you can commit it to 2.0.0 before code freeze.
--
Yusuke Endoh mame@tsg.ne.jp
        
           Updated by zzak (zzak _) about 10 years ago
          Updated by zzak (zzak _) about 10 years ago
          
          
        
        
      
      - Assignee changed from MartinBosslet (Martin Bosslet) to 7150
        
           Updated by hsbt (Hiroshi SHIBATA) almost 3 years ago
          Updated by hsbt (Hiroshi SHIBATA) almost 3 years ago
          
          
        
        
      
      - Status changed from Assigned to Open
        
           Updated by byroot (Jean Boussier) almost 3 years ago
          Updated by byroot (Jean Boussier) almost 3 years ago
          
          
        
        
      
      I just tried my hand at this one: https://github.com/ruby/ruby/pull/6829
I think such a change would make sense. Not that IO#read without a size if common, but might as well do something sensible.
        
           Updated by Anonymous almost 3 years ago
          Updated by Anonymous almost 3 years ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset git|7390eb43fe1bfb069af80ba8f73f7dc4999df0fd.
io.c (read_all): grow the buffer exponentially when size is unknown
[Feature #6047]
Currently it's grown by BUFSIZ (1024) on every iteration which is bit wasteful.
Instead we can double the capacity whenever there is less than BUFSIZ capacity
left.