In the general case, read_all grows its buffer linearly by just the amount that is currently read from the underlying source. This results in a linear number of reallocs, It might turn out beneficial if the buffer were grown exponentially by multiplying with a constant factor (e.g. 1.5 or 2), thus resulting in only a logarithmic numver of reallocs.
I will provide a patch and benchmarks, but I'm already opening this issue so I won't forget.
In the general case, read_all grows its buffer linearly by just the
amount that is currently read from the underlying source. This results
in a linear number of reallocs, It might turn out beneficial if the
buffer were grown exponentially by multiplying with a constant factor
(e.g. 1.5 or 2), thus resulting in only a logarithmic numver of
reallocs.
I think growing the buffer exponentially makes sense.
I would enforce a hard limit (probably <= 8 MB) for each growth,
to:
discourage read_all() for large files, it's very wasteful and
usually hurts performance
prevent memory exhaustion for edge cases (especially on 32-bit)
My experience also shows that it is useless to open a ticket for a reminder to myself :-)
I'm setting to next minor tentatively, but if it is really just a performance improvement (i.e., it affects no external modules), you can commit it to 2.0.0 before code freeze.
Currently it's grown by BUFSIZ (1024) on every iteration which is bit wasteful.
Instead we can double the capacity whenever there is less than BUFSIZ capacity
left.