This article is out-of-date! Please look at this article, or articles written after June 2016.
Today I’m going to explain how I inline message sends in the sista (image-side). This post will really be about how to inline a message send, you’ll have to wait to the next post to know when I inline message sends. In my opinion, this post starts to get really interesting compared to the previous sista chronicles.
I’d like to thank Ronie Salgado, the Roassal 3D guy, that helped me implementing this optimization.
For the first example, we’ll inline a method activation with no control flow interactions. I put the method in the class ManifestASTCore but it does not really matter I’ve just picked a random class.
We will inline bar into example. Let’s look at
MyClass>>example sista representation.
As we can see, a spilled temporary has been assigned to each message send return value. They are not used because these message send return values are not used. However, the spilled temps exist because I wanted to keep the intermediate representation (IR) uniform to have some complex optimizations passes simpler (as inlining). These spilled temps will be removed at code generation if they have no users anyway. Now let’s inline the message.
The inliner has split the original basic block in 2 parts. One part with
self foo, which is the code before the message send, and the other part with
self baz which is the code after the message send. In-between, the inliner has added the basic blocks of the called method,
ManifestASTCore>>bar. However, the inlined code has no statements but a return statement, and the inliner takes specifically cares about the return statement. It puts the return value in a return phi node to keep the SSA property. Here, the phi node has only 1 statement, so it is useless. But again, it is still created to have a common case between methods with multiple returns and method with single return. Moreover, before the message send, the inliner put a trap. This trap will trigger the deoptimization process when the method will be run if the object
self has another class than
As you can see, the graph is in a poor state after inlining: there’s 3 basic blocks but as the control flow is linear, only 1 basic block is needed, and in addition there’s a useless phi node. But this is how the graph is after inlining. I decided to put in a different pass, named
SimplifyCFG, all the control flow graph and phi nodes simplifications, because after most optimizations you need to run it and in most optimizations you cannot optimize the control flow graph (CFG) and simplify it in a single pass. Therefore this allows me to simplify the CFG when I want, probably once after several optimizations instead of once after each optimization and this allows me to have the control flow graph simplification code in a single place and optimized. The control flow simplification can be done in 1 pass but I currently do it in 2 passes (a quick one on basic blocks only then a real one). This is a trick I took from the LLVM implementation (I saw they had a simplify the CFG pass).
So let’s run the CFG simplification pass.
The control flow graph is now much cleaner. The method has been inlined. Let’s look at the bytecode:
26 send: foo
29 send: bar
32 send: baz
26 send: foo
29 trapOn: ManifestASTCore
31 send: baz
An additional byte code was added for traps. This way, traps can be compiled very efficiently by the JIT to native code. The current Sista to byte code generator has issues right now, I need to improve it and fix it (this is my next step). This example works and we’re lucky. But on the other examples I will not show you the resulting byte code because it is not stable enough.
Control flow example
This time let’s inline
^ arg isPlague
ifTrue: [ self foo: arg isCharacter ]
ifFalse: [ self bar ]
bool ifFalse: [ ^ nil ].
Let’s see how
ManifestASTCore>>example2: looks like for the sista:
We have a basic control flow with a phi node to merge the variable value (We are in SSA mode, remember ?).
Let’s inline the message:
Here we can note that as the method had 2 returns, while inlining it is transformed from #ifTrue: to #ifTrue:ifFalse:. In this case, we understand why the phi node was needed in case of inlining: the method has 2 different return values, therefore the phi node allows us to merge this two values in one temp.
Let’s simplify the CFG.
This is the control flow graph we end up with. Having bigger control flow graph is quite useful, because next time the counter will trip the optimizer will have information on branches usage from the JIT, and some branches may be eligible for dead or unused branch elimination (I will show how this optimization in another post, and this will help understanding why a separate pass to simplify the CFG is useful because we also need to run this pass after dead branch elimination).
Last but not least, a simple non local return example 🙂
Let’s take this example:
self evaluate: [ ^ 42 ].
Here we will inline
ManifestASTCore>>exampleNLR and then inline the block evaluation.
Let’s look at the
As we can see, the closure is assigned to a spilled temp instead of directly being an argument of the message send. I decided to do it like that for now because I may have issues if I inlined a method that evaluate multiple times the closure (then I may end up in massive code duplication).
Ok now we are in a state where we can inline the block with the non local return. Let’s do it:
So let’s look at what we end up with. One basic block, I name basic block 2, is not reachable any more. That’s correct, when you inlined a non local return on the main branch you cannot reach further code.
A useless trap (check if a closure is a closure) was added to make the inliner logic similar to the method evaluation inlining logic, however another pass removes later traps on statically typed nodes such as closures, constants (literals) or Array. Let’s run a CFG simplification pass:
Wow ! The code shrank a lot ! Let’s run another pass that removes useless traps (this other pass also patches some non inlinable non local returns and analyses the phi nodes to prepare deSSA).
In the end this method is just these few nodes. See how the code shrank ? Let me tell you that with closure inlining AND dead branch elimination, the code is really really shrinking a lot.
I didn’t talk about non local return inlined in another continuation that the current one, but basically this is the same except that you have 2 phi nodes, one for local return and one for non local returns. I tried to explain it with more details here.
I hope you enjoyed this post.