New ISeq serialize binary format
I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as experimental feature.
So I'll commit them.
There are two methods to serialize and de-serialize.
- RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
- RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.
The goal of this project is to provide "machine dependent" binary file to achieve:
- fast bootstrap time for big applications
- reduce memory consumption with several techniques
"Machine dependent" means you can't migrate compiled binaries to other machines.
They are not goals of this project:
- packing scripts to one package
- migrate obfuscate binary to other node to hide source code
To achieve such goals, we need to consider compatibility issues such as
DATA, and so on (for example, consider about this code:
This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.
So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.
- making a compiled binary files automatically in same directory of script files like Rubinius,
- store compiled binaries in some DB
- make storage data structure in your own.
I wrote several samples:
- dbm: use dbm
- fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
- fs2: use file system. locate compiled file in specified directory.
- nothing: do nothing.
You can see my sample implementation:
The key interface is
When MRI try to load any script named
fname, then call this method with
fname if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling
Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).
The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.
Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.
Several evaluation results:
Try to load resolv.rb 1,000 times (and remove Resolv class each time).
compile 12.360000 0.310000 12.670000 ( 13.413011) compile 12.120000 0.300000 12.420000 ( 13.195313) compile 12.230000 0.270000 12.500000 ( 13.242140) eager load load 3.750000 0.180000 3.930000 ( 3.918169) load 4.000000 0.170000 4.170000 ( 4.178442) load 4.120000 0.200000 4.320000 ( 4.320233) lazy load load 2.410000 0.090000 2.500000 ( 2.609716) load 2.280000 0.210000 2.490000 ( 2.518892) load 2.310000 0.110000 2.420000 ( 2.419687)
3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.
Try similar to resolv.rb.
user system total real compile 8.540000 0.130000 8.670000 ( 8.703615) compile 8.540000 0.150000 8.690000 ( 8.693870) compile 8.430000 0.120000 8.550000 ( 8.547480) eager load load 4.470000 0.150000 4.620000 ( 4.659934) load 4.500000 0.140000 4.640000 ( 4.640365) load 4.610000 0.100000 4.710000 ( 4.708825) lazy load load 3.510000 0.140000 3.650000 ( 3.694146) load 3.470000 0.130000 3.600000 ( 3.609040) load 3.550000 0.150000 3.700000 ( 3.831015)
Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.
Simple rails application¶
time rails r '' on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).
compile: real 0m2.049s user 0m1.601s sys 0m0.402s eager: real 0m1.544s user 0m1.094s sys 0m0.422s lazy: $ time rails r '' real 0m1.536s user 0m1.112s sys 0m0.388s
Not so impressive result. It seems there are many initialization code.