As some of you may have guessed, some virtual machine people are asking me on a regular basis questions such as:
- Why don’t you use LLVM instead of reimplementing it with Sista ?
- Why don’t you use LLVM for low level optimizations ?
- Why don’t you rewrite Cog using LLVM ?
I think today is the day I need to explain how LLVM and Sista could or could not interact, and why, to avoid having these questions all the time in the future.
Question 1: Why don’t you use LLVM instead of reimplementing it ?
In short, I am *not* reimplementing LLVM, Sista and LLVM could work together but you could not use LLVM instead of Sista.
- Tiers 1: the interpreter LLInt
- Tiers 2: the Baseline JIT
- Tiers 3: the DFG (Data Flow graph) JIT
- Tiers 4: The FTL (fourth tiers LLVM) JIT
The general idea is that the tier N + 1 will need more compile time to generate the native code to execute than the tier N but the generated code will be much faster. Therefore, the most frequently used a portion of code will be, the biggest tier the VM will use to generate its native code. Typically, a method is interpreted the 6 first executions, then the Baseline JIT generates the basic native code that is used for the next 60 executions, then the DFG JIT generates evolved native code that is used for the next 600 executions, and lastly the FTL JIT generates very efficient native code that is used from them on.
The problem is even worse in Smalltalk, where every operation is a message send, including addition between SmallIntegers. Between message sends, smalltalk code have basically 2 things: inline cache checks and jumps. It happens that the current JIT specifically optimize these two cases, generating very efficient instructions with very good register allocation. Therefore the current JIT generates code as efficient as LLVM, with however a much lower compilation time (the current JIT compiler is very basic but very fast to compile).
To be able to use LLVM, one needs first to remove the message sends by inlining them / removing them up to the point that you end up with lots of primitive operations such as “+” on int32, assignment to temporaries or jumps. Webkit’s DFG JIT and Sista are doing exactly that: inlining message sends, removing bound checks for array access and unboxing Numbers to int32 or double. After these steps, LLVM has some code to optimize and may worth be used.
So Sista is *not* in competition against LLVM to generate the fastest native code, but it is one of the steps required in an optimizer that may (or may not) use LLVM.
Question 2: Why don’t you use LLVM for low level optimizations ?
Now that I explain how LLVM could interact with Cog and Sista, I guess that you, readers, may understand why the question very relevant. In theory, what we could do, is to use LLVM to generate efficient native code from the optimized compiled method that Sista has produced. Let’s compare the pros and cons.
If I use LLVM, the unique pro is: the LLVM native code generated will be faster than the one I will produce manually, as there are dozens of developers improving and maintaining LLVM. This is because LLVM maintained several platform-dependent optimizations that I will not implement, as well as exotic optimizations such as automatic parallelization.
If you discuss with the webkit guys, you may notice that around 9 months ago, they released many LLVM optimization passes that they now use for their JIT. This means that integrating LLVM is far from “out of the box”. You need to integrate new optimization passes in LLVM, which means that in our case we spread the VM implementation from slang and C to slang, C and C++ in LLVM.
Now if you look at the cons of using LLVM:
- The memory foot print of Cog (around 300kb + the native code cache zone that is typically from 1 Mb to 2 Mb) will increase by 3.5Mb, which is the memory footprint of LLVM.
- Cog will rely on LLVM, so if LLVM is not maintained any more, we are screwed (Notice that Smalltalk has been running since 1980 and that most common libraries from 1980 are not maintained any more, so the philosophy of writing simple but working library in Smalltalk has paid off).
- Cog relies a very different stack management than common project to be able to support efficiently the stack to context mapping (including on-the-fly stack edition from the language). You would need to write many new passes on the LLVM IR to get stack frame to context mapping information. The cost of these implementation is massive.
- Cog relies on a Garbage Collector. Integrating our Garbage Collector with LLVM may not be simple. Note also that Smalltalk supports exchanging two objects in memory (#become:). I am not sure this kind of feature would work out of the box with LLVM generated code.
- The Smalltalk runtime relies heavily on non local return (In Smalltalk, non local return are a common case). It happens that the only way to represent non local returns in LLVM is to use exceptions, and the “zero cost” exception model of LLVM is very slow in the case where the exception is raised, which is common for Smalltalk. Therefore, one also needs to write much code and optimization passes in LLVM to handle efficiently non local return.
Am I going to try using LLVM for Cog + Sista ?
When the Sista optimizer will be in production, the next step will be to improve the quality of the native code generated by improving the byte code to native code translation. One solution is to use LLVM for this purpose.
Let’s look at the Cog JIT tiers:
- tier 1: Stack interpreter
- tier 2: Cogit (baseline JIT)
- tier 3: Sista with on-the-fly stack replacement or Sista postponing some optimizations to the background process
In our model, it may be that in the tier 3, if you detect that a method has reached its maximum optimization potential, generating the native code of the optimized method with LLVM will be worthwhile. However, the gain of 40% performance compared to the huge amount of work that using LLVM implies (add and maintain LLVM passes for non local returns, for the integration with the garbage collector and for the stack frame to context mapping) simply does not work for us. Therefore, if someone is interested in plug in a LLVM backend for Sista’s optimized methods, I will be really happy to help him, to see the experimental result and if it is good, to help integrating it in the production VM. But I will not do it myself, as we have only a few people working on the Pharo VM nowadays, and that our limited resources cannot support the LLVM back-end for only 40% performance boost. But you guys can try to convince me in the blog post comments (for example, if you tell me that you will add and maintain LLVM passes for non local returns, for the integration with the Cog garbage collector and for the Cog stack frame to context mapping, and that you will do that within a year, it will be very convincing).
Now, if I would build a low level compiler, such as a C compiler, will definitely use LLVM.
Question 4: Why don’t you rewrite Cog using LLVM ?
I hope you guys enjoyed the post 🙂