Contents
When you accept weakly typed data as input or have them as output, still convert them to strongly typed objects inside your application.
Recently I was fiddling with my toy project Fix again. I was adding a new functionality when my initial sloppy design came back and bit me.
The problem
Working with JSON
Fix plans to be a simple issue tracker when it’s grown up. Issues are created by sending them in JSON format to an HTTP server and are then stored as JSON in files after adding bookkeeping data (i.e. an ID for now).
JSON in, JSON out made the initial design decision dead simple: Instead of creating an issue class, I just pass the JSON through the logic between the I/O layers as it is. After all, I am developing the logic with TDD, so there can be no surprises, right?
I knew that at some time I probably would switch to proper classes when the content of the JSON became more complex. But for now, the issues consist only of a summary, a description text and, after storage in the file system, an ID.
{
"summary": "Some issue",
"description" : "A text that describes in more detail what actually is the issue",
"ID" : 1
}
Assumptions
It was already during the implementation of the second basic functionality (listing all issues) when I encountered problems: The list would contain the IDs and summaries of all issues, i.e. it had to strip the description from each stored JSON object and put them all into a list.
{
"issues" : [{
"summary": "Some issue",
"ID": 1
}, {
"summary": "The second issue",
"ID": 2
}]
}
I am developing Fix relatively slowly. In this case, I had not dealt with the file storage for weeks when I wrote the acceptance and unit tests for the listing functionality. This has the effect that I can see whether the code and design are more or less self-explanatory, at least to someone who thinks alike.
It was not. I knew that the files stored the issues in JSON format, so I just parsed the content into JSON objects and then stripped the "description"
element. It turns out that I had not stored the plain issue data but the whole JSON object that got sent to the server during the creation of the issue:
{
"data": {
"summary": "...",
"description" : "...",
"ID" : ...
}
}
There was no top level "description"
element, so the stripping failed and the acceptance tests gave me errors that I had to debug. The unit tests I had written for the test-driven development used mock objects that returned JSON data in the form I had expected – not in the form that got actually returned by the real storage object.
Lesson learned
JSON content is weakly typed data, so I had to rely on assumptions that turned out to be wrong. This error then manifested itself somewhere inside the application logic. Had I used proper classes instead of JSON for the objects that get passed around in the logic, this would not have happened. Wrong assumptions about the JSON content then would appear in one place only, namely during the parsing.
It turns out that I have to do a lot of what a parser or converter would do anyways: The business logic currently has to check whether JSON data sent to the server has the correct elements. Putting this into a converter would have the benefit that the checks will also apply to the other I/O boundary where data is read from the storage.
This problem is of course not restricted to JSON but to any information that is handled without objects that structure and restrict it to the allowed set of values. The probably most often misused form of data in that regard is plain strings. We simply assume a given string to contain one of a handful of given character sequence, instead of using enumerations. Or we assume it has been sanitized for the database instead of using different types for sanitized and unsanitized strings.
This is another example that we should use the good support provided by the language through static typing. It can eradicate a whole category of errors, or in this case at least restrict them to a single place, the conversion from weakly typed data to objects.
Permalink