Working with legacy code
I’d like to continue my thoughts on working with legacy code triggered by Arjan’s video. I have already highlighted some things on that in my Telegram. Can’t stop thinking about it, so decided to write it down.
At the end of his video Arjan shows the following five steps:
- Analyse (project, its code, etc.)
- Goal setting (what do you want to achieve)
- Create safety net (code legacy code with tests)
- Refactor step-by-step
- Measure success
I think these are great step to start with. Yet I’d swap the first two steps, because it doesn’t make any sense to work on something without a clear goal. In the video the goal was “make it easier to add new updaters”.
Goal
The goal in the video is vague, because what is easy for one might not be easy for another. Even more: adding a new updater would probably be easier to do compared to the whole refactoring + adding a new updater.
And it is always like that with refactoring, because business doesn’t need it. It is a waste of time and money. Justifying refactoring is an art: I know that it will be easier to work with refactored code, but I don’t know how to measure it. Because usually it affects future: it will be easier to add new features, it will be easier to debug issues, it will be easier to modify code according to new requirements or environment, etc.
Yet it is hard to convert gut feeling into number, especially for the future: the whole project might be canceled, so why bother at all?
Analyse
Example was clear, but what do I do with legacy project with thousands lines of code? I see no way I can comprehend such a project and reach any reasonable level of understanding it.
There only option I see here is to use source control to highlight parts of code that is being changed often compared to parts that are not. Another approach would be to monitor how project is deployed and used from its telemetry.
Safety net
This is my favourite, I love tests! Even in the video I wasn’t sure if the amount of tests that Arjan wrote is enough. I’d use coverage tool for such example. But again, it was relatively easy in a small example, now let’s get back to the real life. I saw main()
functions for a thousand lines, the amount of if-else
conditions was quite high. Some of them were repetitive. And those conditions depend on input files that were generated by a third-party tool based on dynamic network environment. Can’t think of anything to create a good coverage for such code.
There is a good question: “How many tests is enough?”. And there is also the answer I like: “When you feel safe making changes, it is enough”. Adding more tests would likely just make the system more complex, having less tests would not feel safe. The only issue here is the definition of “feel”. Hard to make an objective metric out of it.
Having some data from the previous step would help to see which parts of the code should be covered with tests first. Wouldn’t really help with my 1K lines function as there were no structure at all in that code.
Other steps
Refactoring itself is more technical, and it was nicely covered by Martin Fowler in his book. For years it lies on my table, one day I’ll read it.
As for success measure, that would depend on the goals. Ideally, goal should contain at least one hint on how to measure it.