On 02/05/2018 01:10 PM, lars@greiz-reinsdorf.de wrote:
Issue #14445 has been updated by larskanis (Lars Kanis).
I have to thank you for getting the whole MJIT train forward!
Unfortunately on Windows the startup times of GCC are so crazy slow, that MJIT is rarely useful. An idea is to bundle several iseq units from the queue to one compiler run. In a very simple test I was able to combine more than 10 iseqs without significantly increasing the compile time. Is this something you could imagine?
Actually, the first variant of MJIT had this feature. It had a batch
which contained a few iseqs. The batch was an entity to compile.
The batch had own drawbacks. Iseq can be compiled several times (e.g.
because of different levels of speculation). You can remove old iseq
code only when all other iseqs in the same batch became obsolete because
code of all batch iseqs is in the same shared object and we can remove
only the shared object. It results in keeping several variants of code
for one iseq in memory. The more iseqs a batch contains the more memory
can be wasted for the obsolete code. So I removed batches from MJIT.
This code can be found on https://github.com/vnmakarov/ruby in a
variant before Aug 2.
The memory wasting might not a big deal as we have no aggressive
speculation and iseq re-compilations are rare. Probably it is worth to
restore the code. Linux and MacOS can have batches containing only 1
iseq. For Windows, the batch can have more than 1 iseq. I could try it
but unfortunately not before April because I am quit busy with GCC 8
release these days.
I have only experience with CYGWIN, gcc is very slow there. I guess the
same problem with MINGW. I suspect the acceptable compilation speed can
be achieved only by using native visual C compiler.
Another strategic way to solve the problem could be an implementation of
a simple tier1 JIT compiler. In this case, the current MJIT with
GCC/Clang would become a typical tier2 JIT compiler, which generates a
very optimized code but takes much more compilation time. Zing VM is
such example. It changes only JVM tier 2 (server) compiler by LLVM not
touching tier1 compiler which is fast but generates less optimized
code. I thought about this approach. On my evaluation such jit could
be 4-5K of C, achieving 70% of GCC -O2 performance on x86-64 but at
least 10 times faster in compilation speed. For windows, tier1 could be
used more frequently than on Linux/Macos. But there are still a lot of
questions for me with this approach.