Wednesday, October 5, 2011

Hotfix Hell

I am a firm believer in many agile processes and tools, including iterative development, where you work with many deliveries. This allows for early, frequent feedback, and allows you to find errors early, before the project turns into an error-driven development project.

Unfortunately, this is not always possible.

A type of project I often work at, is the large project which spans several years, where the customer involvement is minimal, except at the start and at the end. This sort of project is pretty much doomed to go over time, go over cost, have many errors etc. They are exactly the reason why agile development has become so popular. Even if the project is developed agile, the lack of customer involvement, will mean that the project could in the wrong direction, without anyone finding out, before at the end.

Typical projects where this happens, are public projects subject to tenders. Here the scope of the functionality etc. is determined before the contractors bid on the contract (though often with a clarification phase at the start), and after the bid has been accepted (plus initial clarification phase has passed), the scope, deadline, and price are fixed. The customer will then often not be involved until the final acceptance test phase, where the solution is often found to be lacking (to put it mildly).

As this approach has obvious flaws, there has been an attempt to fix it by introducing sub-deliveries, which each has to pass the acceptance tests. In my experience, there are typically two sub-deliveries before the final delivery at the end.

This approach might seem somewhat agile, and since it gives earlier feedback, you’d think that it would help. Unfortunately, in my experience, it actually worsens the problem.

The problem is in the acceptance test part.

Picture the typical two year project with three deliveries (two sub deliveries and one final). Given the fact that there is a lot of scaffolding etc. at the start, the first sub delivery will fall after one year, with a sub delivery half a year later, and the final deliver half a year after the second sub delivery (i.e. one year after the first sub delivery).

This is, in itself, not an unreasonable schedule.

Unfortunately, the programmers involved will often have to acquire domain knowledge, while doing the scaffolding and early development of functionality, increasing the likelihood of wrong decisions and/or errors in the implementation. Some of this will become apparent as the first deadline approaches, and might be changed in time - unfortunately some it won’t be possible to change it all, and some misunderstandings will only become clear during the acceptance testing.

Since the project is on a tight schedule, the work on sub delivery two starts - often starting by changing the flaws found before the deadline, which they didn’t have time to fix, but also start on new functionality etc.

Unfortunately, the errors will still be in the code submitted to the acceptance test.

Since the errors are found, the acceptance test fails, and the customer rejects the sub delivery as it stands.

What happens then? Well, this is where the hotfix hell starts.

Given the fact that the code submitted as sub delivery one has failed the acceptance test, the developers have to fix the errors in the submitted code, which is now out of date, compared to the code base. This is done by making a patch or hotfix to the code.

The patched/hotfixed code is then re-submitted to acceptance testing.

If this passes, then all is well. Unfortunately that’s rarely the case. Instead, new errors will be found (perhaps introduced by the fix), which will need to be fixed, re-submitted, tested etc. This will take up considerable amounts of time, calender wise, but also resource wise - meaning that programmers, testers, customer testers etc. will use a lot of time fixing problems in the code, which they could have spent on other things. Other things, such as sub delivery two.

Just because sub delivery one has failed the acceptance test, doesn’t mean that work on sub delivery two has stopped - it is, after all, to be delivered six months down the line.

Unfortunately the plan didn’t take into account the hours need to work on sub delivery one after the delivery and/or date for expected acceptance test. This means that sub delivery two is in problems, since the developers won’t have time to do all the work required for it to pass acceptance test.

Meanwhile, sub delivery one and sub delivery two move more and more apart, resulting in developers having to fix problems in obsolete code - this is frustrating for the programmers, and introduces the risk of errors only being fixed in the old delivery instead of being fixed in both, since porting the fixes is difficult.

At some stage, the sub delivery will pass the acceptance test, or (more commonly in my experience) it will be dropped, as the next delivery either is about to be delivered or has been delivered.

Due to the work on the sub delivery one, the second sub delivery is unfortunately either late or in such a mess that it cannot pass acceptance test (or, more likely, both).

This means that when sub delivery two is handed in, hotfix hell starts all over again.

So, how can this be fixed?

Well, one way is to do iterations the agile way. Unfortunately, that’s not particularly likely to happen.

Another way is to base deliveries on the date when the acceptance test of the earlier sub delivery has passed. So in the above example, the second delivery will be handed in six months after the first delivery has passed the acceptance test.
Given the nature of the projects using sub deliveries, this is also unlikely to happen. Often the last deadline is defined by a new law or regulation, and is firm (until it becomes completely apparent that it cannot be done).

A more likely solution would be to take the overhead to hotfixes into account when planning. This would mean that the time spent on hotfixes on sub delivery one wouldn't take time set aside to sub delivery two. The problem with this approach would be that this would make the price higher when bidding, since more people would be needed to finish the work on time, than if one assumes that not hotfixes are necessary. On top of that, it is also hard to estimate just how much time is needed for this (in my experience, everybody vastly underestimates this).

My suggestion would be something more simple. Timebox the acceptance test - and call it something else.

Before the project starts, the customer and the contractor decide how much time will be used for testing the sub delivery in order to make sure that the fundamentals are fine, but without resulting in the developers having to fix obsolete problems.

When the timebox is done, the customer will either accept or reject - a rejection would mean that the customer think the code is so fundamentally flawed that it cannot be used. If that’s the case, the customer and the contractor will need to sit down together and figure out how to get on from there. Perhaps the final deadline will have to be moved, the contractor will have to add more people to the project in order to get it back on track, or the customer will have to become more involved in the development process (e.g. by providing people who can help testing during the development of sub delivery two).

I am well aware that my suggestion breaks with the concept of sub deliveries, but I would claim that the concept of sub deliveries is fundamentally flawed, and instead of helping a problem, it actually makes it worse. Since this is the case, I think we have to re-think how they are used, if at all.

No comments:

Post a Comment

If the post is more than 14 days old, your comments will go into moderation. Sorry, but otherwise it will be filled up with spam.