This article is out-of-date! Please look at this article, or articles written after June 2016.
To speed up Smalltalk methods, I would like to perform some optimizations such as inlining, dead code elimination and copy propagation. The problem is that this cannot be done easily on the byte code level (or you would need to have multiple parser and datas aside but this would be crazy). As the byte code is not a good intermediate representation (IR) for optimizations, the optimizer needs to convert the method’s byte codes into an intermediate representation that is suitable for optimizations. Based on a prototype and recommendations, I chose to convert the byte code into a single static assignment control flow graph. This intermediate representation allows one to optimize a method both with data and control flow analysis easily.
An intermediate representation for optimizations
Here is the current optimization infrastructure:
The method’s byte code is firstly decompiled into a control flow graph. The hard part here is the conversion between a stack based representation to a three address code (TAC) one. Therefore it is needed to add some extra temps, the so-called spilled temps, to correctly convert the method.
Secondly, the SSA (single static assignment) property is added to the control flow graph. This is done by analyzing all definitions and uses of each variable. The current implementation works in different steps:
- calcul of each basic block depth
- finding the dominators of each basic block with the Lengauer-Tarjan algorithm (if you don’t know what’s a dominator, open any compiler book)
- collecting all def sites of each variable
- adding empty phi functions
- filling phi functions and setting for each variable its def and uses in the graph
Then some optimization passes are performed. The first passes I will implement are different inlining strategies.
Next, the graph is deSSA, which means the phi nodes are removed and different variables are merged into one when possible. This is done in different steps. Basically, the optimizer analyses the liveness of each variable, then look for interferences between them and lastly remove the phi nodes.
Lastly, the codeGenerator spits out byte code from the graph. This code generator is currently based on Opal compiler’s IR. The byte code generated is part of an extended byte code set, with specific byte codes added to tip the JIT on how to produce better machine code.
Visualizing the IR
Having a nice IR is cool. However, one needs to debug it. Especially, one needs to know for each optimization step if everything was done correctly. At hand, I had the byte code pretty printer and the source code edition tools. For example, here is one of my test method with simple double branches:
Source code representation
Byte code representation
My first strategy was to implement a nice pretty printer. This made it easier to debug the IR:
Note: For each variable, the number [number] permits to identify its definition. In single static assignment, a variable with different definition is in fact multiple variables.
This pretty printer increased a lot my productivity and simplified a lot my work. However, although the pretty printer is good to debug single basic block, it was still hard to debug the control flow. Especially, after inlining a method with multiple returns of after a dead branch elimination optimization pass, the control flow is quite different. A guy, Usman Bhatti, showed me how to quickly implement a visualization with Roassal, a graphic engine, and Mondrian, a DSL to easily build visualization. I was not able to implement the same visualization I saw in compiler books, but the result is much better than the pretty printer and very helpful. Here is an example (click on the pic to see it bigger):
- Squares with large borders are entrance or exits in the method
- Squares in red are frequently used basic blocks (based on counters added by Cog’s JIT in the native code)
- Squares in blue are almost never used basic blocks (based on counters added by Cog’s JIT in the native code)
- Squares in black are basic blocks that are used about half of the time (based on counters added by Cog’s JIT in the native code)
- Back jumps are not displayed (this method has no back jump anyway). This was too complex to include on the visualization and I believe that displaying back jumps makes big control flows hard to read (based on java visualization I saw)
In addition, the roassal visualization provides a few interactions. For example, when you right click on a basic block the contextual menu lets you inspect it.
That’s all folks.
I guess the next sista chronicle will start getting a bit interesting …