What is tech debt?
Tech debt as a term has been around for a while. I remember being introduced to it by my tech lead at the time, Tom McMillen, probably back in 2006 or 2007. As a concept it had been around for a while before that.
Since learning about it, I have made a point to understand what it really is, how it affects the products we create, but also how it can be used in our favour. Unfortunately, working with many companies, I find the term mis-used frequently, losing both its ability to succinctly describe what we mean, and our power to use it.
So what exactly is it?
Tech debt is all the shortcuts your engineers have taken to get something out on time, all the bad decisions they have made – whether knowingly or unwittingly, and all the (parts of) systems that have been left behind. It typically manifests itself as systems that are hard to maintain and build upon, and are far more costly to run than they should be.
What is it not?
So is tech debt just everything that engineers have screwed up or haven’t had time to do yet? Certainly not! That would be a carte blanche. Quite the contrary, tech debt is actually quite specific. Therefore, it certainly does not include:
- Bugs, either found before release and decided not to fix, or found after release.
- Features that were cut from scope in order to deliver on time.
- Feature requests that come in after the software is released.
- Misunderstandings about scope between different parties in the engineering lifecycle.
Systems with tech debt work. They serve the needs that they were built for. But they are difficult to evolve to meet new and emerging needs, and/or they could be run far more cost effectively using up to date technologies.
How does tech debt come about?
Below is an image adapted from Martin Fowler’s blog post entitled Technical Debt Quadrant. This describes four quadrants of tech debt based on whether it was created deliberately or accidentally, and whether the decision was prudent or reckless.
In the Deliberate quadrants, the team knows they are taking on tech debt and are making a conscious decision to do so.
Deliberate and Prudent is where the team makes a conscious decision to take on a piece of debt. They know it’s not the neatest solution, but the cost of delay outweighs the cost of doing it right at this point in time. A really prudent team will plan the remedial work immediately, to be done at the earliest opportunity.
Deliberate and Reckless carries far more risk. This is where an entire best practice is dropped. This is an approach you see teams taking when a release’s timescales are completely unrealistic. The cost of retrofitting the best practice is often prohibitive, and the damage can be severe.
In the accidental quadrant are situations where the team wasn’t aware they weren’t following best practices and find out after the fact, often because they encounter issues or find that making changes later is a lot more difficult than they expected.
Accidental and Prudent is where the team is in uncharted territory, and later finds that they didn’t do it in the optimal way. For example, not optimising the use of cloud resources, due to lack of understanding of billing rules and how to architect to minimise spend.
Accidental and Reckless is where the team is completely unaware of a best practice. A modern day example would be to implement an architecture on premises and acquiring additional hardware to run it, when it could run for a fraction of the cost in the cloud.
Do I have a tech debt problem?
Do you have highly skilled engineers, yet they struggle to get things done? Is your cloud bill unusually high for what you are running there? Are there areas of code or entire systems that most of your engineers don’t want to go near? Chances are, you have a tech debt problem.
How do I identify tech debt?
There are a few big categories you can look at initially:
- Legacy systems – even if these were state-of-the-art when they were built, they are now cumbersome to maintain, expensive to run, and stop you from delivering the changes you really want.
- Left behind parts of the system – newer techniques and architectures have been implemented in other parts of the same system, but these parts haven’t been updated to follow suit. Similar to legacy systems, these cost more to run and maintain.
- Inconsistent approaches – every engineer has their own preferred way to do something. Look for multiple approaches to the same problem.
Better to be consistently wrong, than to be inconsistent.
If you’re consistent, every engineer may get confused by your approach exactly once. If you’re inconsistent they’ll be confused every time they encounter yet another approach to the same problem.
- Lack of industry best practices – to make a state of the art system, you need experienced technical leadership who know what best practices look like. Without this, the team can create a lot of Reckless Accidental debt.
- Ask your teams to identify the debt for you – but be aware that they may not spot Reckless Accidental debt!
- Get an external technology strategy review by a third party acquainted with best practice to highlight areas where your current systems and approaches fall short and can be improved.
Is tech debt bad?
Yes!
Agile software development is about being able to respond quickly to market movements. You also want to achieve Continuous Delivery, releasing multiple times per day, getting each ticket into production as it is completed. Tech debt slows down engineering flow and needs to be avoided at all costs so that we can achieve this nirvana of engineering productivity.
No!
True as that might be, Lean Product Development tells us that the primary goal is to get something into the hands of customers rapidly so that we can start to get feedback, learn, and adapt our offering to what the customer actually needs. If we spend forever tweaking the code, we are just delaying finding out whether it’s the right product in the first place. Besides, with the constant emergence of new technologies, frameworks and new SaaS solutions, the lifespan of a typical system is getting shorter. If it meets our needs now, who cares if it’s hard to maintain if we’re rewriting it in the JavaScript flavour of the month again in 2 years anyway?
Well, maybe
Too much debt is a problem, whether the technical or the monetary kind. But then it’s also a necessary part of achieving our goals quickly, while maintaining our technical agility.
Think about buying a house…
Few people can save up to buy a house outright, instead most people get a mortgage. This means they’re in debt for a very long time, but if they can afford the interest and make regular repayments to pay it down over time, then it’s an acceptable and relatively low-risk way to get to the end goal of home ownership faster.
The same goes for technical debt
The same goes for technical debt. Taking on some tech debt is fine if it allows you to get a product out there quicker so you can start to learn from real users. But ideally you only take on Deliberate Prudent debt, and have a plan to pay it down once your release goes out.
But take on too much, or allow older systems to become too legacy, and your engineering will slow down, your costs will go up, and you’ll have dark areas where no engineer dares go for fear of breaking a system nobody knows how to fix.
How do I solve my tech debt problem?
Paying down tech debt
- Educate your teams and stakeholders on what tech debt is, and what it is not. Many people have a very poor understanding of what it is, and getting everyone on the same page is crucial in managing tech debt. Make sure you can explain what you are doing and why to your board in clear business terms.
- Measure the tech debt, after all you can’t manage what you don’t measure. Make a list, create a board, put it somewhere everyone can see. Get everyone to contribute their suggestions for tech debt.
- Check that everything on the list is actually tech debt. If you have educated everyone well, it should be, but people often still mistake bugs and feature requests for tech debt. Don’t keep these on the list, your next steps are about improving engineering velocity and reducing cost, not to improve the product from a user’s perspective.
- Quantify the tech debt in effort to fix and the cost of not fixing it.
- For tech debt affecting opex, make a guess as to the annual cost you could save. Don’t be too specific, a ballpark is enough for prioritisation.
- For tech debt affecting effort to change an area of code, guess the extra amount of time each change takes and multiply by the number of times this code changes in a year. Include the full lifecycle, design, development, testing, release, etc. Again, no need to be deadly accurate, it’s a ballpark.
- Get an estimate of effort to fix each item of tech debt.
- Prioritise based on effort and the cost of not fixing it. A picture will emerge of typically a few significant pieces that can be resolved fairly quickly, a number of legacy systems that would take significant effort to fix up or replace, and a list of items that aren’t so bad and probably not worth resolving unless their combined weight is holding you back.
- Plan an appropriate response. You must balance other business drivers and the drive for tech efficiency. Sometimes it may be appropriate to carry on as normal and pay off little bits of debt as and when possible. Other times the impact is so significant, investing significantly now will pay dividends before the end of a project or programme of work. The correct answer is entirely situational and will require your unique perspective on your business and your tech.
- Negotiate with stakeholders about the measures you want to take to pay down the debt. With the figures arrived at in the previous steps, you can make rational choices about which debt should be paid down and when, and support this as a business case with any stakeholders.
- Monitor the level of tech debt, removing items that are resolved, and adding new ones as they are identified. Adjust your response to remain appropriate for the level of tech debt and balanced with the pressures of other business objectives.
Techniques for paying down debt
- Team Time A specific time set aside where the whole team fixes tech debt, e.g. Team Time Tuesdays. This works well when there’s a manageable quantity of tech debt and you want to stay on top of it or slowly whittle it down.
- %age Velocity The team spends an agreed percentage of their velocity each sprint on tech debt. Similar to Team Time but without a specified time, and instead of everyone doing it, it could be specific people in the squad who pick up the debt. A risk is that these tickets aren’t prioritised and the team doesn’t get to them.
- Debt Sprint An entire sprint where all engineering squads do nothing but fix tech debt based on the prioritised list. This can be useful when tech debt is rampant and a significant reduction is required urgently.
- Finders Fixers Engineers get the mandate to fix any tech debt as and when they come across it. This is unpredictable and difficult to sell to stakeholders, but with a team of sufficient maturity in identifying and refactoring bad code smells, this is very effective at keeping tech debt low.
- Be creative! Consider how much time is needed to stay on top of or drive down your tech debt and how your teams can carve out their time to do this while still meeting stakeholders’ needs.
Avoid taking on more debt
Once you have educated your teams and stakeholders on what tech debt is, when it can be useful, and what the risks are, you’re well underway to keep Deliberate tech debt in check.
To avoid any relapse, continue to make tech debt visible. Keep the board or wall or list alive, keep adding items, quantifying them and re-prioritising. If you have an ongoing system in place to spend time fixing debt, then all you need to do now is continuously monitor the debt and fine-tune the response.
Keep an eye on legacy systems or legacy parts of systems: do not let them become problematic for your pace of change.
Preventing “Accidental” tech debt is a lot harder as these are things the teams aren’t aware of when they happen. Keep training and educating your people; look at certifications, technical product demos, best practice sessions, etc. You can also use external help, getting people in at team and senior level to review, validate and help guide the architecture, implementation and practices in use.
The key is to be “Prudent” about the debt. Make explicit agreements when taking on debt. That is to say, “we will take this shortcut now, but once we release it we will spend the time to put it right.”