Hi folks,

Today I’m going to talk about the byte code to source code mapping. Basically, I spent the last two weeks integrating the new compiler Opal in Pharo with Marcus Denker. We encountered several issues with this mapping.

Quick reminder : Temporary variables implementation in Pharo

As I talk about temporary variables mapping in the next sections of this post, it seems it worths it to remind a little how works temporaries in Pharo (implementation wise of course).

In Pharo, methods are represented by 2 objects :

  • a CompiledMethod : it is created at compile time and contains the byte code and the references to the literals of the method.
  • several MethodContexts : at runtime, a MethodContext instance is created each time you execute the method.

A MethodContext contains, among others :

  • the receiver : to know what objet correspond to self / super
  • the CompiledMethod : to know which bytecode set to execute
  • the pc : to know what is the current executed bytecode instruction offset in the CompiledMethod (that permits to know what byte code is the next to execute for example)
  • the sender MethodContext : to know where the execution should continue when the method returns something or is fully executed.
  • the temporary variables : to know their values in the current execution.

The MethodContext is a variable sized object to be able to store the temporary variables. To make it easy to understand, let’s say the temporary variables are stored in an array in the context. An array means a mapping from integer index to the value of the temporary variable. That means the byte code and the context don’t know anything about the names of the temporaries, they know only their indexes. For example, in the method :

| temp |
temp := #bar
^ temp

At byte code level, you will not get things like :

17 pushConstant: #bar
18 popIntoTemp: #temp

but :

17 pushConstant: #bar
18 popIntoTemp: 0

0 being the index of temp in the array of temporary variables in the context.

In some way, this is really cool because arrays are much faster than dictionaries, so the execution will be faster. But it also means that if you want to debug a context, you will get this :

Screen Shot 2013-04-24 at 10.15.55 AM

instead of :

Screen Shot 2013-04-24 at 10.16.04 AM

And, as a end user, you don’t want to debug context from indexes of temporaries, but from names of temporaries. And for that, you need a mapping.

Wait ? You need a mapping to map an index of an array to the names of the temporaries ? Are you kidding me ? It is easy as hell, when I taught C++ to my grandMother she was able to implement that in 5 minutes.

In fact it is not as easy. The mapping would have been easy if Smalltalk would have only methods. But there are also blocks. At runtime a block corresponds to 2 objects : a BlockClosure instance and a context. The context is like the one of the methods, plus a reference to the blockClosure (stored in the variable … ClosureOrNil of course ! Because this variable is nil for method contexts and contains the closure for block contexts).

What is problematic in blocks (implementation wise) is that they can access temporary variables from their enclosing method. And of course, as this is slow, it is optimized. Basically there are two main optimizations :

  • Case of a method temporary variable read only in the block : the optimization consists in pushing the temporary variable on top of the stack when creating the block so you can access it. So in the block context, this temporary is in fact a reference to an enclosing context temporary.
  • Case of a method temporary variable written and / or read in the block : the optimization consists in storing those temporaries in an external vector, known by both the method context and the block context. This is a little bit tricky to manage in the mapping, because you have a new set of indexes in the vectors, an index of the vector in the method and the block contexts …

Now imagine that in Smalltalk you can have several nested blocks in a method, and in each blocks you can have some read only or read and / or written temporaries. Then you need to manage which temporaries are in vectors, which are references to an other context temporaries, … This mapping is not as easy as it looks like.

Now let’s get deep into the subject.

What is the source code to byte code mapping used for ?

Previously, all this mapping was stored in the class DebuggerMethodMap. The name let us guess the mapping is used mostly in the Debugger. And why does the Smalltalk debugger need a mapping between byte code to source code ? Well, for two main reasons :

  • It needs to highlight the correct range of characters in the source code while stepping into the byte code of the methods.
  • As the current byte code model does not include the temporary variable names, it needs to find out the name of a variable from the byte code (see the reminder previous section). This way it can show them in the context inspector and to be able to evaluate temporaries.

Here we can guess there is another use case of the byte code to source code mapping. When you use the methods:

  • MethodContext>>#tempNamed:
  • MethodContext>>#tempNamed:put:
  • MethodContext>>#tempNames

And this thing complicates the mapping. Because in the debugger you don’t really need a lot of speed, whereas to run these methods you need speed.

How the old mapping, currently in Pharo 2.0, works ?

It works really well. But it is very hard to understand and very complicated so I will just briefly explain.

For source code highlighting, the old mapping creates two dictionaries, one that maps the byte code of the method to the abstract byte code, and the second one that maps the abstract byte code to the range in the source code. To highlight a method, you would need to scan, parse and partially compile the method to create those two dictionaries. Then you use them to get your highlight source range.

For temporaries, the mapping creates some kind of array of pairs. Pairs can be name -> integer index in context for regular temp, name -> some other data for temporaries in remote vectors or another kind of data for temporaries from enclosing contexts. This is just impossible to understand and fix, because your debugger shows you just arrays of arrays of symbols and integers, but it has the great advantage of being really fast.

How the new mapping, that will be deployed in Pharo 3.0, works ?

Just to remind you, the compilation chain of Opal is :
Source code -> AST (Abstract syntax tree) -> IR (Intermediate representation) -> byte code

Source code highlighting

For the highlighting, we (Marcus and I) noted something very interesting, the AST nodes know their source intervals (RBProgramNode>>#sourceInterval). Then, each IR knows the AST node that generated them. Lastly, from each byte code you can get the IR quite easily, as each IR knows the byteCode offset of the byte code it generated so you just need a quick scan (see below).

IRMethod>>#instructionForPC: aPC
0 to: -3 by: -1 do: [ : off |
(self firstInstructionMatching: [:ir | ir bytecodeOffset = (aPC - off) ]) ifNotNil: [:it |^it]]

So we decided to do this mapping on AST level. The plan was to generate the AST and IR from the compiled method, then to detect from byte code the corresponding IR, from IR the corresponding AST node, and lastly return the source interval of the node. And it worked. So the mapping is in fact not a mapping because we don’t have dictionaries or any collections at all.

Of course there are some interesting details. The most interesting one is the following : in a debugger, the context on the top of the stack does not highlight the same way other contexts does. In fact, the top context highlights the instruction in source code that *will* be executed in the next step. On the contrary, all the other contexts highlights the instruction in source code that *has been* executed previously. That means that you need to take care about that in the mapping to highlight the correct range.

Right now we had to keep the old mapping API so there is a method named rangeForPC:contextIsActiveContext: that has a boolean as second arguments. With this boolean you know if you need to look for the range of the current pc (first argument) of the previous pc of the first argument. On the old mapping, they needed the exact pc for their pc to abstract pc mapping. So they needed a method previousPCFor: to handle the case of multiple byte code instructions. But we don’t care, as the pc will be used in the scan of instructionForPC:, so we just need to do pc – 1.

The result is :

DebuggerMethodMapOpal>>#rangeForPC: aPC contextIsActiveContext: contextIsActive
"return the debug highlight for aPC"
| pc |
pc := contextIsActive ifTrue: [aPC] ifFalse: [aPC - 1].
^ (methodNode sourceNodeForPC: pc) debugHighlightRange
RBMethodNode>>#sourceNodeForPC: anInteger
^ (self ir instructionForPC: anInteger) sourceNode

Temporary names mapping

Then, we implemented the temporary names mapping. So basically we wanted to go back to AST level again instead of creating huge arrays. This was not that hard. In fact, AST BlockNodes and AST MethodNodes know their corresponding scope. We hacked it a little to be able to get the enclosing block node / method node of every node and then the associated scope :

^ parent isNil ifTrue: [nil] ifFalse: [parent methodOrBlockNode]
^ self
^ self
^ self methodOrBlockNode scope

Then, we added the method source Node in Method Context (see below) that permits to get the BlockNode or MethodNode corresponding to it :

| activePC |
activePC := pc - 1.
activePC >#sourceNodeForPC: aPC
^ self sourceNode sourceNodeForPC: aPC
RBMethodNode>>#sourceNodeForPC: anInteger
^ (self ir instructionForPC: anInteger) sourceNode

Lastly, the MethodContext>>#tempNames methods are trivial. We need to keep a call to the mapping for backward compatibility with the old compiler, but it would not even have been needed :

^ self debuggerMap tempNamesForContext: self
DebuggerMethodMapOpal>>#tempNamesForContext: aContext
^ aContext sourceNode scope allTempNames

Here we come to the main problem. If the mapping is called through the MethodContext>>#tempNamed: or MethodContext>>#tempNamed:put:, we find the tempNames based only on the context and the temp index. But wait, we have also said just before that if the context is at the top of the stack, we have to shift the bytecode offset differently. Here’s the trick, in fact to find the tempNames we just care about being in the correct block node or method node. So the calcul

activePC := pc - 1.
activePC <= self startpc ifTrue: [ activePC := pc ].

is enough to get the proper tempNames, because even if we look for the tempNames from an incorrect AST node, it will be the one just before the correct one and it will be the same scope. In the case of the first pc we need to check and don’t shift the pc anyway for other reasons so we have no problem.

So the problem is solved. Now we can note two interesting points :

  • with MethodContext>>#sourceNode, the decompilation of blocks become really trivial too.

    OCFakeDecompiler>>#decompileBlock: aBlock
    ^ aBlock asContext sourceNode

    You cannot even imagine how complicated this used to be.

  • the tempNames mapping, now working on AST level, works the same way for optimized blocks (to:do:, ifTrue:ifFalse:, ifNil:ifNotNil:, and:, or:) and non optimized blocks

There’s just a little drawback for now. The debugger is used to refresh the context inspector only when you change the context. The problem is, in the case of optimized blocks, we are still in the same context while executing them. Previously, the debugger used to show the temporary variables from the optimized blocks out of their scope in the whole method, so it didn’t care. But now, we show them only in their scope, so we would need to add a refresh of the context inspector in the case of stepping into an optimized block. And this will be easy with the new debugger from Andrei, but hard with the current one from the eighties.

That’s all for today 🙂