Contents
Last week I have touched source file organization of generated code. Today I’ll share some thoughts on testability of code when a part of it is generated.
Code generation may be costly. Sometimes, code generation is not easy to set up – although you should at least try to automate the setup. Sometimes the generator itself might be too costly to set up on our local machine, either in terms of license costs or in terms of resources. Or generating the code may just take ages.
If we have to work on the code that is generated, i.e. on the sources it is generated from, we do not have much of a choice – we will have to deal with whatever impediment the code generation process offers. But even then, we will want the generated part of our application to be as independent as possible from the manually maintained part and vice versa.
Testing and using the generated code
Ideally, the code generator is well-tested, including its output. If the complexity of the generation is not too high, a sufficiently comprehensive test sample and accompanying tests can give us a high confidence in the generated code. In that case, integration tests might suffice to assure the quality of the generated code in our application. The test sample will only need maintenance when features are added to the code generator.
For more complex code generators we still might want to write unit tests for the generated code in our application. The generator should support that as much as possible, e.g. by generating test stubs. We will also want to have a clean API to the generated code, to make it not only easily testable but also usable from our handwritten code. After all, test cases should cover and demonstrate actual use cases.
That clean API may include some parts that never change, i.e. that are fixed and do not depend on the sources of the code generation. Those fixed parts should either be emitted by the generation process as well and reside together with the actual generated code, or they should be provided as a support library by the code generator.
This is not only because that code belongs to the actual generated code, but it also will be needed to test the code generation itself. It would be a waste of time to rewrite code in different places that is always needed and always the same.
Testing non-generated code
Of course, we want to test as much as possible (that means, all) of our handwritten code. In order to do so, it should depend on the generated parts as little as possible. In general, it is infeasible to use techniques like dependency inversion to make the generated code depend on the manually written code, as it would mean that the code generation needs to be modified whenever the manual code it depends on changes.
However, the dependency can be made minimal. In the best case, the manually written code can depend on a fixed API that does not change and fully encapsulates the generated code. As discussed, that API belongs to the code generator, but it should be deliverable independently in order to decouple manual code from generated code.
If on the other hand the dependency from the actual generated code can not be completely reduced, it should be as small as possible. In addition, the manually written units depending on it should be as few as possible in order to have less unit test depend on generated code.
Especially if code generation is costly, we can explore the option to have a separate, less costly generation of test stubs. That way, the code depending on the generated API can be unit tested without depending on the full code generation.
Conclusion
The thoughts above will not apply to all cases of code generation. Jumping through hoops to reduce dependencies to a fast, lightweight, easy to install code generation process may not pay off. But for larger code generation processes, decoupling them from the manually written code as much as possible certainly can pay off.
What are your thoughts and experiences on the matter? Please leave a comment!