Clean Code Generation

In many projects, there is a certain amount of code generation. The generated code is not seldom treated as a second-class citizen, the only measurement of code quality being whether it compiles and passes the tests. We can – and should – do better.

Why generated code matters, too

The general reason for not taking too much care of generated code usually is that it is not checked in. The code generation program often has been written some time ago and “just works” and produces compilable output that passes the tests. It may be legacy code that looks patched together, but nobody ever needs to touch it, so that’s OK.

Until the day comes when we need to debug a little deeper than usual through the layers of our project. Suddenly, land in the result of our code generation. Maybe we dug up a bug that slumbered down there for years. Maybe we just have to find out what we did wrong using that code. We end up in a jumble of code without indentation and cryptic names that is not readable to anyone but the compiler.

Another reason for a little more care towards our code generation is that at some point we will need some additional functionality from the generated code. The “traditional” approach would be to repetitively write the additions by hand. We then later have to change them along with the source of the code generation. After all, the generated code is not too maintainable, so we won’t dare to touch the code generation to add those features the right way.

Fix the code generation scripts first

The first step to make the code generation including the generated code approachable ist to have maintainable and cleanly written scripts for it. It does not matter in which language they are written, although arguably C++ might not be the right tool. Code generators mostly manipulate strings and do some file I/O. There are lots of languages that make those tasks really easy.

In order to get there, the scripts have to be in a format that is approachable to any developer on the team. Don’t make it necessary to use fancy corners of the language for the code generation itself. It can be beneficial to set up a framework or a small DSL to make it easy for anyone with a minimal amount of knowledge of the language to modify the scripts.

Of course, the scripts themselves merit the same care as production code. Use indentation and good variable names, don’t make functions too long etc. In some sense, maintaining those scripts is even more important than maintaining the production code itself: The code generation is one of the tools our whole projects builds upon.

Generate readable code

The code we generate should, of course, be readable, too. Remember that we might have to debug it, or just inspect it to figure out where we have to apply changes to implement that new feature. That includes all the principles we also apply to the code we write manually.

Usually, we do not start out writing generated code. Instead, we write some code manually and then implement the code generation that produces equivalent code. Equivalent mostly means that the generated code has the same effects as the manually written prototype. However, we can very well do a little better. We can generate code that looks exactly the same as the original, including indentation, empty lines, you name it.

Make the outcome apparent from the code generation

To keep the generated code maintainable, we should not need to actually run the code generation scripts to know how the result will look. That means that the content and structure of the output should be obvious from the script code. This can be achieved when we use the framework or DSL I mentioned above if they provide the right functionality.

Conclusion

Take care of your code generators and the output they produce. It can be rewarding, even fun, to produce lots of functionality that just works from a concise source without losing readability.

Previous Post
Next Post

3 Comments


  1. Also nice if you take the time to make sure the generated code is also indented and styled using an appropriate convention for that target language. Mostly it’s easier to read the output code that having to read the template/script to work out any issues. It’s also nice when the generated output helps you find the right line numbers in the source.

    Reply

  2. I’ve written code generators and making clean output is absolutely critical. I’ve seen way to many code generators implement the attitude that no one will ever look at, or horror upon horrors, modify the output.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *