Today I would like to talk about an interesting testing process I used recently on two projects I was working on : the Opal Compiler and an AST interpreter for Pharo. Both will be by default in Pharo 3.0.
Working at RMOD, I am already using some interesting artifacts as a continuous integration server and test driven development. In the case of Opal and the AST interpreter, it means that when the main developments were finished, there were basic tests to check if basic code could be compiled / interpreted.
Adding to these tests, there were some more tricky tests about the code that is known to be harder to compile / interpret. For instance, Opal is optimizing the code on some loops and conditions, so there were specific tests on code using the optimizations. For the interpreter, the hardest things to interpret were exceptions handling and non local returns in blockClosures, so a lot more tests on these aspects were added. Lastly, the continuous integration server used to trigger all the tests at each commits, preventing the team to break anything with some code edition.
The next step, as the main developments were finished, would have been to deploy the 2 projects in Pharo (As a research team, we don’t have people to do loads of functional tests). But it was not enough, even with the tests suites (450 tests for Opal and 50 tests for the AST interpreter), we were not confident enough to deploy the projects. We needed to raise our confidence level on the 2 projects.
Opal compiler :
On the Opal compiler, I was working with Marcus Denker. He has a great idea : modifying the continuous integration server build not to only install the compiler into a fresh new image and run all the tests, but also to recompile the whole image and then run all the Pharo tests with the freshly new recompiled methods. In fact, this new build was a bit too long to run, so we split it into two builds : 1 quick to build for rapid feedback and a longer one to raise our confidence level on the compiler. These new tests permitted us to discover some new bugs, for example when we ran the previous compiler tests with our compiler or on some nested BlockClosures in loops.
Now that we fixed those bugs, we have a very high confidence level on our new compiler : we can guarantee Opal can recompile the whole Pharo image and that the newly recompiled Pharo image can run all the Pharo tests without any bugs. Of course, we cannot guarantee that there are no bugs left, but as The Pharo test suite covers a huge majority of the code, we are very confident in Opal.
AST interpreter :
When I finished the AST interpreter, I wanted to do the same. I encounter two problems while doing the second continuous integration server build.
Firstly, the AST interpreter was around 250 times slower to run the tests than the current Pharo VM. So, some tests that used to take 20 seconds in Pharo would have taken 1h40. To reduce the time of the continuous integration build, I limited the test suite to the Pharo Kernel, and I put a time out on tests when they ran for more than 10 seconds. This way, I had around 98% of the Pharo Kernel tests running through my AST interpreter.
Secondly, it was not as easy as for Opal to run all tests, as the AST interpreter could only interpret AST. So, I took the collection of the Pharo Kernel packages, and looped over them running the AST of every tests presents in the package. The problem was the output of tests were not easy to understand. For example, all you got was ‘There was a failing test in the package KernelTests-Methods’. That was not enough. I have no magical solution for that so I printed out on the console of the continuous integration server build each test when it was run and the result, so that when I saw a failing test I was able to know which exact test failed. This continuous integration build permitted me to discover some more bugs on non local returns (that are fixed by now).
In the end, I can now say that my AST-interpreter can run all the Pharo Kernel test (actually 98% of the kernel tests and the first 10 seconds without failing of the other 2% of tests), which again raise a lot my confidence level in the software and permitted me to correct some more bugs.
That’s all folks ! You can see the result of this work on the RMOD continuous integration server. The builds I talked about are Opal, OpalRegression, ast-interpreter and ast-interpreter-regression. I hope you will be able to get inspired from this process to raise the confidence level of your software 🙂