This article is out-of-date! 
Please look at this article, 
or articles written after June 2016.

Hello guys,

Today I’m going to explain how I inline message sends in the sista (image-side). This post will really be about how to inline a message send, you’ll have to wait to the next post to know when I inline message sends. In my opinion, this post starts to get really interesting compared to the previous sista chronicles.

I’d like to thank Ronie Salgado, the Roassal 3D guy, that helped me implementing this optimization.

Basic example

For the first example, we’ll inline a method activation with no control flow interactions. I put the method in the class ManifestASTCore but it does not really matter I’ve just picked a random class.

ManifestASTCore>>example
self foo.
self bar.
self baz.

ManifestASTCore>>bar
^ 42

We will inline bar into example. Let’s look at MyClass>>example sista representation.

inliningEx1

As we can see, a spilled temporary has been assigned to each message send return value. They are not used because these message send return values are not used. However, the spilled temps exist because I wanted to keep the intermediate representation (IR) uniform to have some complex optimizations passes simpler (as inlining). These spilled temps will be removed at code generation if they have no users anyway. Now let’s inline the message.

inlining11bis

The inliner has split the original basic block in 2 parts. One part with self foo, which is the code before the message send, and the other part with self baz which is the code after the message send. In-between, the inliner has added the basic blocks of the called method, ManifestASTCore>>bar. However, the inlined code has no statements but a return statement, and the inliner takes specifically cares about the return statement. It puts the return value in a return phi node to keep the SSA property. Here, the phi node has only 1 statement, so it is useless. But again, it is still created to have a common case between methods with multiple returns and method with single return. Moreover, before the message send, the inliner put a trap. This trap will trigger the deoptimization process when the method will be run if the object self has another class than ManifestASTCore.

As you can see, the graph is in a poor state after inlining: there’s 3 basic blocks but as the control flow is linear, only 1 basic block is needed, and in addition there’s a useless phi node. But this is how the graph is after inlining. I decided to put in a different pass, named SimplifyCFG, all the control flow graph and phi nodes simplifications, because after most optimizations you need to run it and in most optimizations you cannot optimize the control flow graph (CFG) and simplify it in a single pass. Therefore this allows me to simplify the CFG when I want, probably once after several optimizations instead of once after each optimization and this allows me to have the control flow graph simplification code in a single place and optimized. The control flow simplification can be done in 1 pass but I currently do it in 2 passes (a quick one on basic blocks only then a real one). This is a trick I took from the LLVM implementation (I saw they had a simplify the CFG pass).

So let’s run the CFG simplification pass.

inlining12bis

The control flow graph is now much cleaner. The method has been inlined. Let’s look at the bytecode:

old method:
25 self
26 send: foo
27 pop
28 self
29 send: bar
30 pop
31 self
32 send: baz
33 pop
34 returnSelf

new method:
25 self
26 send: foo
27 pop
28 self
29 trapOn: ManifestASTCore
30 self
31 send: baz
32 pop
33 returnSelf

An additional byte code was added for traps. This way, traps can be compiled very efficiently by the JIT to native code. The current Sista to byte code generator has issues right now, I need to improve it and fix it (this is my next step). This example works and we’re lucky. But on the other examples I will not show you the resulting byte code because it is not stable enough.

Control flow example

This time let’s inline ManifestASTCore>>foo: in ManifestASTCore>>example2:.

ManifestASTCore>>example2: arg
^ arg isPlague
ifTrue: [ self foo: arg isCharacter ]
ifFalse: [ self bar ]

ManifestASTCore>>foo: bool
bool ifFalse: [ ^ nil ].
self baz.
^ 42

Let’s see how ManifestASTCore>>example2: looks like for the sista:

Inlining21

We have a basic control flow with a phi node to merge the variable value (We are in SSA mode, remember ?).

Let’s inline the message:

inlining22

Here we can note that as the method had 2 returns, while inlining it is transformed from #ifTrue: to #ifTrue:ifFalse:. In this case, we understand why the phi node was needed in case of inlining: the method has 2 different return values, therefore the phi node allows us to merge this two values in one temp.

Let’s simplify the CFG.

inlining23

This is the control flow graph we end up with. Having bigger control flow graph is quite useful, because next time the counter will trip the optimizer will have information on branches usage from the JIT, and some branches may be eligible for dead or unused branch elimination (I will show how this optimization in another post, and this will help understanding why a separate pass to simplify the CFG is useful because we also need to run this pass after dead branch elimination).

Last but not least, a simple non local return example 🙂

Let’s take this example:

ManifestASTCore>>exampleNLR
self baz.
self evaluate: [ ^ 42 ].
self bar.

ManifestASTCore>>evaluate: aBlock
aBlock value.

Here we will inline ManifestASTCore>>evaluate: into ManifestASTCore>>exampleNLR and then inline the block evaluation.

Let’s look at the ManifestASTCore>>exampleNLR representation:

Inlining31

As we can see, the closure is assigned to a spilled temp instead of directly being an argument of the message send. I decided to do it like that for now because I may have issues if I inlined a method that evaluate multiple times the closure (then I may end up in massive code duplication).

Let’s inline ManifestASTCore>>evaluate::

inlining32

CFG simplification:

inlining33

Constant propagation:

inlining34

Ok now we are in a state where we can inline the block with the non local return. Let’s do it:

inlining 35

So let’s look at what we end up with. One basic block, I name basic block 2, is not reachable any more. That’s correct, when you inlined a non local return on the main branch you cannot reach further code.

A useless trap (check if a closure is a closure) was added to make the inliner logic similar to the method evaluation inlining logic, however another pass removes later traps on statically typed nodes such as closures, constants (literals) or Array. Let’s run a CFG simplification pass:

inlinng36

Wow ! The code shrank a lot ! Let’s run another pass that removes useless traps (this other pass also patches some non inlinable non local returns and analyses the phi nodes to prepare deSSA).

inlining38

In the end this method is just these few nodes. See how the code shrank ? Let me tell you that with closure inlining AND dead branch elimination, the code is really really shrinking a lot.

I didn’t talk about non local return inlined in another continuation that the current one, but basically this is the same except that you have 2 phi nodes, one for local return and one for non local returns. I tried to explain it with more details here.

I hope you enjoyed this post.

Advertisements