Tuesday, June 30, 2009

Ending error-driven development, part 3 - evaluating the team

As I wrote in my original post about error-driven development, there are usually multiple underlying problems which have lead to the project becoming an EDD project. These problems can be technical in nature, but they might also be people-or role-related. While my earlier posts in this series focused on the technical problems, it’s important not to ignore the person- or role-related problems as well, be they internal or external to the team.

When working on an error-driven development project, there is a tendency for the people in the team to focus outwards when trying to finding this sort of problems. This is of course a natural response when a team comes under pressure (aka. “Circling the wagons”), but it’s not really a good response, since it won’t help solve the problems.

In my opinion there are two types of problems which can be related to the people in the team.
1) Qualifications, competences, and inter-personal skills of the team members
2) Roles of the team members

Of course, as always, both sorts of problems might be in play at the same time.
This post will only focus on the first type of problems, while my next post will focus on the roles in the team.

Qualifications, competences, and inter-personal skills of the team members

When a project starts up, a great deal of attention is usually given to the people who become part of the team, since these people will to a large degree create the fundaments of the system, and help shape the future development in the project.
This allows for a false sense of security, since everybody assumes that the people who were picked had the qualifications and competences required to do their job. This is unfortunately not always the case.

An example, which I’m always presented for as a consultant, is the case where there is a key position that’s filled with an external ”expert” (read: consultant with documented experience). Given the fact that this expert obviously have the qualifications (documented experiences in the position), it’s assumed that the expert also has the competences for the position. This is not always the case. Sometimes they have been lucky in the past, and not run into problems; at other times they might have embellished their résumé (putting too much emphasis on one aspect on their work, without making clear that this was only a minor aspect).

Since the external expert was hired exactly because there were no-one available who qualified, there will be no one around to ask questions about problematic decisions.
At some stage, it becomes clear that the decisions might be sub-optimal, to put it mildly, and people will work hard to either compensate for the problems they cause (the typical scenario) or actually have to re-do the work (less typical, but sometimes the better scenario).

I should add that while it might sound like I am trying to insinuate that people like external expert are lying about their qualifications or competences, they might genuinely believe that they are good at what they are doing (see my post on the Dunning-Kruger effect on my other blog for more details on why this could be).

All what I wrote about the hypothetical external expert can go as well for internal team members who misjudge their own qualifications.

Later, when the project has run into problems, more people will be added to the project, without as much focus on their skills, which might cause similar problems.

Here are some steps to avoid these problems.

Peer review

Peer review is of course well-known from scientific publications, but in reality, there is absolutely nothing wrong in doing the same with systems architecture, software architecture, and code. What it simply means, is that something is reviewed, by someone on a similar level (a peer), to find obvious flaws, before it’s accepted. It doesn’t mean that there won’t be any errors, but it means that at least one other person have gone through the product, and tried to understand it, before it’s accepted.

The more fundamental to the project the product is (e.g. the overall layering of the software architecture) the more important it is to get it peer reviewed. Any flaws in the product will cause a lot of grief later and the earlier they are removed, the better.

In case that there is not anyone inside the project who are able to do the peer review, try to look outside the project. Is there anyone else in the same company who can do it? Or can you hire someone for that particular job? The latter option will cost money, but it might save a lot in the long run.

Ensure that the team members have the necessary domain knowledge

Unless you work for a company which only makes a certain type of systems, it will always be necessary to have some kind of domain knowledge when developing a system. It’s not necessary for the developers to know every detail about the domain, but they should understand the fundamentals, and they certainly should understand the details about what they are implementing.

How can you realize that what you’re doing is not making sense, if you understand neither the problem you’re trying to solve, nor the solution to the problem?
In my first blog post in this series, on testing, I made one recommendation on how to ensure that people had the necessary domain knowledge.

With regards to the test cases, my general suggestion is that they are not written by technical people, but rather by people with an understanding of the business domain. Optimally they should be written at the same time as the requirements or at least before the coding really start, and only be in general terms. Before the developer starts developing, he or she should read the relevant test cases, making sure that he or she understands the requirements as stated in general terms. If there are some business concepts that appear unclear, it’s possible for the developer to acquire the necessary domain knowledge before starting on the development.

This is one way, but there are of course others. How you do it, depends on the project and the people in the team. It might require a course, or it might be solved by having one of the end users close at hand, ready to answer any questions which might pop up (something which I would recommend anyway).

Only take on new people if you have time to integrate them into the team

Time after time, projects run into trouble, and management starts throwing more people at the project, in the hope that it will make it possible to finish on time.

That’s not going to work – it just won’t happen.

Adding a new developer to a team is not cost free, and adding several developers at the same time, can be extremely expensive. A team which has worked together for a while, knows each other, and often runs smoothly, even if there are problems with the project as a whole. Adding new team members, without given thought to when and how that’s done, will cause problems.

In 1965, Bruce Tuckman proposed the Forming – Storming – Norming – Performing model for teams, where teams move through different phases (see Wikipedia article). Every time new group members are added, the team will get thrown backwards through the phases, and it will take a time for them to re-group.

Of course, if the team is not running well, adding new team members might be less of a problem, but then, there are probably other issues which should be addressed as well.

Don’t be afraid to get rid of people from the team

This is probably the most controversial of my suggestions, but I know that many people will agree on this.

People in teams depend upon each other, and if a team member can’t get along with the others, or underperforms, it can ruin the entire team. If it appears that’s the case in your project, try to talk with the team member about the problem(s), and see if there is something which can be done, but if there isn’t (or if the problems continue) then don’t be afraid to get rid of the team member.

It might cost you a member of the team, but you’ll find that a good team will perform a lot better than a team which doesn’t work properly. This difference will more than compensate for the loss of man-hours.

Don’t consider mentoring a cost, but an investment

Unless you’re very lucky, not everyone in your team will be experienced and excellent developers when they start on the project. This means that they will have to learn while working on the project.

Learning while working on a project means two things:
1) They will make mistakes
2) They will draw upon the other team members to help them solve problems.

When the going is rough, it’s easy to resent the time the experienced team members have to use to help the less experienced team members. Don’t do that. Consider the time spent on helping the inexperienced team members an investment, helping them reach a level where they can work on equal level with the rest.

If it gets out of hand, boundaries needs to be set, but as long as the experienced team members’ performance doesn’t suffer too much (you’ll have to decide what this constitutes), I recommend letting the other team members draw upon them whenever is necessary.

Create an “ideal” code example

This is something I saw working with success at a customer. They had some general architecture and coding guidelines for the whole company, but realizing that each project would have differences, they also used a concept called “the good code example”. This was a code example in the projects, showing how a typical code implementation would be done correctly in that particular project. The example had to be picked with care, to be representative, and it had to be updated every time changes were made to the coding standards in that particular project, but it served as a great references.

Every time a new member got added to the team, they were given the code example, and told to try to understand it. This worked great for giving people a basic understanding of the software architecture they were going to work in.

Monday, June 29, 2009

Ending error-driven development, part 2 - adapting agile practices

This post is not about whether agile methods are better than the waterfall model; rather it’s about adapting to the situation, and using the most useful tools for solving the problem. Error-driven development is often plagued by a number of problems, which might be solved, or at least reduced, by using some of the practices from the agile methods, such as scrum or eXtreme programming.

If you are already using an agile method, it might be worth evaluating whether it’s the correct method for the situation.

Anyway, here are some agile practices it might be worth adapting to your project.

Daily stand-up meetings

A lot of the problems in EDD projects are related to breakdowns in communications - not only between the team and the customer, but also between the team members and the team leader or between different team members.

The purpose of daily stand-up meetings is to spread knowledge of what each person has done since last time, what they are doing now, and what problems they have encountered. This allows team members to either help solving the problems or to plan their work accordingly – there is no need to start working on something which will run into the same problem before the problem gets fixed.

In the book Manage It, Johanna Rothman distinguish between status meetings, where everyone tells everyone else what their status is (something she consider a waste of everybody’s time) and daily stand-up meetings, where people tell the others about a) what they have just finished, b) what they are going to do now, c) what problems they encountered (something she consider very valuable). In other words, daily stand-ups can be said to give a status of the current progress for each team member.

While I certainly agree with Rothman that daily status meetings are not as good as daily stand-ups, I would hesitate to make the claim that they are a waste of time. In a project with a real breakdown in communications, status meetings can help get everyone up to speed on the project as a whole. Of course, such status meetings probably shouldn’t be daily, but rather weekly, and they should be discontinued (or at least held less frequently) when the project is back on track.

Work in short iterations

Agile methods focus on working in short iterations (usually 2-4 weeks long) where there is a finished product at the end, which can be tested. By finished, I mean a product with fully integrated functionality which can be tested through the entire system (e.g. from the GUI all the way down to the database).

This allows for continuous testing, and will give early warnings about problems in the requirements, architecture, or technology. On top of that, it has the benefit of demonstrating progress to people inside and outside the team – the psychological value of this cannot be overestimated, in a project where everybody feels that they have been working hard without showing any progress.

This approach also works in systems which are fully implemented, but which are full of bugs. Here the functionality should be fixed so they are bug-free.

No matter whether it is new functionality or existing functionality, they should be prioritized accordingly to how important they are for the customer, and where they are in the workflow of the user. If you have a workflow where the functionalities are used in the order A->B->C->D then you should implement them in that order, even if the customer feels that B is more important than A. The exception of course being if there are alternative workflows which will take the user either directly to B or to B through some other functionality – then it might make sense to implement B before A.

If the project is just one of several intra-dependent projects (e.g. if the service provider and the service consumer is implemented at the same time), it’s important to coordinate the iterations, so any dependencies between the projects are taken into consideration when planning iterations.

Implement functionality rather than architecture

This pretty much follows from the last point, but it’s important to keep in mind anyway. When developing a system, there are a lot of frameworks that need to put in place (caching, data access layers etc.), but once that’s done, the team should stop thinking in terms of architecture. Instead the team should focus on functionality.
An example of developers thinking of architecture instead of functionality is the case where new requirements are implemented layer-wise. E.g. first all the changes are done to the database, then to the ORM etc. This means that none of the new requirements (or changes to the old ones) is done before they are all done. Not a good way to make visible progress, and not something which can be easily tested before the very end.

Consider pair programming for complex problems

I must admit that I am not particularly hooked on the concept of pair programming, as I am not sure that the costs are in proportion with the benefits. If you have two programmers on equal level, then pair programming can make sense, since it can create a synergy effect, but if the programmers are on different levels, then it will quickly turn into a mentoring process. While I find mentoring processes valuable, they have their time and place, and it’s not necessarily in the everyday programming in a project that has hit problems.

Still, if there are complex problems which need to be solved in the system, then pair programming might very well be a very good idea. The benefits of pair programming should in most cases easily be worth the reduced productivity for a time (given most people don’t pair program most of the time, there will be a cost in productivity when doing this). The benefits of having two people doing the code is that they will work together to solve the problems, making it more likely that it’s done correctly, and there will be two people who understand both the problems and the solutions to them (as they were implemented).

Use continuous integration

To my mind, there is nothing that beats instant feedback when there are problems. Continuous integration is a powerful tool in helping giving developers instant feedback about problems with their code, allowing them to fix those problems as soon as they occur.

Continuous integration is, simply put, the practice of continuously committing your code to the code base, where it’s build and have tests run on it (unit tests and smoke tests) to see whether it works as it should.

This doesn’t absolve the developers of running unit tests etc. before they check in. Instead it’s a safeguard against any issues that might have occurred during check-in (forgotten to check in a file etc.).

For more on continuous integration, go read Martin Fowler’s article on the subject.

Monday, June 15, 2009

Ending error-driven development, part 1 - testing

Some time ago, I wrote about something I call ”error-driven development”, which is a type of software development I come across all too often. You can find the original post here.

I’ve found out that many software developers and consultants can relate to the post, and I’ve discussed with several what one can do about error-driven development (EDD).

Well, there is no perfect answer to this question, since the root cause of EDD is different in every EDD project. I have, however, been on a number of EDD projects through the years, so I have some suggestion on some general measures one can do to either turn EDD into something else, or to limit the damage.

I’ll try to go through some of them from time to time. In this post I’ll focus on testing.


I will make the claim that testing is one of the most underrated activities in software development projects, and this has to change in order to avoid EDD. What’s more, testing is also a widely misunderstood concept. Testing is a much bigger activity than most people believe, and covers more aspects than generally thought.

Testing should of course ensure that the system works as intended, but it should also ensure that the system doesn’t work when it’s not supposed to, and that the system can handle unexpected events in a meaningful way.

In his book, Release It, Michael Nygard makes a very good point: Systems are built to pass acceptances tests, not to run in the real world. This is one of the things that lead to EDD projects, where the developers are working on a later version of a system which is in production.

Testing should allow for the particularities of the real world, and not only for the test environments (see Release It for some very good examples of the differences, and some good ways of making up for these differences).

There are several types of testing, some of which I will cover here, and in my experience, focusing on just one of them, will lead to problems in the long run.

Unit testing

With the spreading of concepts like test-driven development, unit tests are very much in the vogue. Unfortunately, books on TDD and its irk generally doesn’t explain how unit tests should be written – just that the they are important, and should be written before the code.

Making unit tests ensuring that code works as expected is of course very important, but if that’s all what the unit tests do, it’s not enough. Unit tests should also ensure that code doesn’t work when that’s expected – e.g. if a method gets an invalid parameter, you expect it to fail in some way or another. Tests for this – don’t just assume that this is the case, even if the code works with correct input parameters. Besides ensuring that the code works as it should, even when this means throwing an exception, it also makes it easier for others to see what behavior is expected of the code.

There is, unfortunately, a tendency to focus on code coverage of unit tests, where code coverage is taken to mean percentage of code lines executed during the tests. This is the wrong code coverage measure. Instead one should focus on covering all the breaking and non-breaking states that the code can be in.

E.g. if you have some code which receives a text string containing a number, which it converts to a number, make sure to test the following:
a) A string containing a positive integer
b) A string containing a positive floating point number using the normal separator (in the US an example could be “10.10”)
c) A string containing a positive floating point number using a separator from a different culture (e.g. the Danish “10,10”).
d) The same as b) and c) just with thousand-separators (“1,000.00” and “1.000,00” respectively).
e) The same as a) through d), but with negative numbers instead.
f) A string containing a number too large to be handled by the data type it’s going to be converted to.
g) A string containing a negative number too large to be handled by the data type it’s going to be converted to.
h) A string containing letters
i) A string containing zeros in front of the number

I could continue, but you get the point. As you can see, that’s a large number of tests for a fairly simple functionality, which is often implemented by using built in functionality. Even so, it’s worth spending the time on doing these, as this is the sort of things which can cause real problems in production.

Smoke testing

Unit tests are of course not the only sort of testing; there are others which are just as important. Smoke tests are automatic tests which can be run to test different flows through the system. E.g. in an internet portal, the smoke test might log in, and navigate to a specific page, while entering data in the intermediate pages.

These tests generally need some kind of tool to be made. Depending on your development framework and the nature of the system, you need to find one that suits you. In portal projects I’ve seen pretty good results with smoke tests made in Ruby, but in my current project, we are using Art of Test’s WebAii, where the tests are written in C# or VB.NET (but can test web GUIs written in other languages).

Smoke tests require a lot of time to make and maintain, especially in a system where the user interface is changed often. In such cases, it might make sense to have resources focused on running and maintaining smoke tests. These shouldn’t only focus on this, but they should have the responsibility to ensure that all smoke tests can run at all time.

Even if there are people responsible for maintaining it should be the responsibility of the developers to run the relevant smoke tests before checking in any changes to the user interface, and in case it fails, to correct the tests or the code as needs be.

Smoke tests help ensure that changes in one part of the user interface don’t have a negative impact on the functionality of another part, which is often the case.

Integration testing

In these days of SOA, ROA and what have you, it’s very rare that a system stands alone. Rather, systems tend to work together with other systems through integration points. Even the system doesn’t work with other systems over the network, it will generally use a database manage system, such as DB2, Oracle, or MS SQL, run in an operative systems (*NIX, Windows etc.), or have other interaction with other systems. All this should be tested.

If possible, integration testing should be automated, but even if that’s not practical for some reason or other, manual integration testing should be done.

As in smoke testing, it’s possible to get a number of tools which allows you to make the tests. The selection of tools again depends on the system and the development framework.

Integration testing can be very difficult, as the testing is dependent upon external systems, some of which might not have been coded yet. In such cases, remember that it’s not the other systems that the test should test, but rather the integration points with these. So, there is no real need for a fully functional system in the other end. Rather, it’s sufficient to have a mock system which sends data as it could appear from the external system. This can be done through tools like soapUI, which can both send data through webservices your system exposes, and which can serve as a receiver for your web service requests. Of course, this isn’t always enough, and I have experienced a project where the behavior of the developed system was so dependent on the retrieved data, that it was necessary to build a simulator, simulating all the back end systems.

Remember to test for differences in cultures in different systems. Can your system survive that the date-format or numbers it receives confirm to a different cultural standard than yours? This is something that’s easily overlooked, but which can have a great impact – either by crashing the system, or by the system misunderstanding the values. It makes a great difference if the date “1/5/2009” is January 5th or May 1st.

Even less ambiguous formats might cause problems, and they can be even harder to figure out. E.g. if you use a date format “dd-MMM-yyyy”, would be fine for the first 4 months when exchanging data between a Danish and a US system, but on May 1st it would be “01-May-2009” in the English speaking world, but “01-Maj-2009” in the Danish speaking world. This could mean that the system suddenly, and unexpectedly, stops working as expected, even though everything has been running just fine until then (this is not an made up example – I once started in a new job on May 1st, where my first accomplishment was to figure out this exact problem).

The more integration tests you make during development, the fewer fixes needs to be done when the system is in production (I refer to Michael Nygard's Release It for good advice on making testing environments for integration testing).

Manual testing

There is unfortunately a tendency for developers to believe that as long as you have enough automatic tests, there is no need for manual testing. This is of course nonsense.
No matter how many automatic tests you have, and how sophisticated tools you’ve used to make them, there is no substitute for human eyes on the system.

Manuel tests can be divided into two groups: Systematic testing (based on test cases) and monkey testing.

Systematic testing, normally done based on test cases, tests the functionality of the system, ensuring that it works as specified, including implicit specifications. The testers should have enough understanding of the business that they meaningfully test the system, not just follow the test script step-by-step.

With regards to the test cases, my general suggestion is that they are not written by technical people, but rather by people with an understanding of the business domain. Optimally they should be written at the same time as the requirements or at least before the coding really start, and only be in general terms. Before the developer starts developing, he or she should read the relevant test cases, making sure that he or she understands the requirements as stated in general terms. If there are some business concepts that appear unclear, it’s possible for the developer to acquire the necessary domain knowledge before starting on the development. When the system is developed, the test cases can be made specific to the system (I recommend keeping the unspecific test cases in reserve though, as the system can change a lot over the time, and it’s good to have some general test cases to refer back to).

As with all the earlier tests, there should also be testing of wrong usage of the system, ensuring that this wrong usage will result in neither major problems nor a wrong result.

Note that while test cases and use cases might sound similar, at least at first, as I describe test cases, that’s not really the case. Use cases describe things on an abstract level, while test cases are more specific. In an insurance system, a use case would describe how the user creates an insurance policy. Test cases would not only describe that the user will create an insurance policy, but rather what sort of insurance to choose, what values should be used, and what extras should be selected.

Monkey testing is unsophisticated testing of the system, where the tester tries to do whatever suits him or her, trying to provoke a failure in the system. It might be entering a wrong value in a field, clicking on a button several times in a row, or doing something else unexpected by the developers. The purpose of the testing is to emulate the sort of things which might happen in the real world, outside the safe testing zone.

While monkey testing it’s very important to document the exact steps which results in the error. Some times the symptom of the error (the system failing) occurs a rather long time after the action which caused the error.

In conclusion

There are of course many other sorts of testing (performance testing for one), but I feel that by doing the sort of testing I mention, one can do a lot to prevent a project turning into an EDD project.

The reason good testing can help avoid EDD is simple. A lot of the time EDD projects only addresses the symptoms, fixing bugs as they are reported, but they don’t address the fundamental problems, so these fixes are only temporary at best, and in general introduces other errors, which are only discovered at a later stage.

Testing will ensure that the system being developed is stable, or at least that the non-working functionality are discovered at an earlier stage. Testing also ensures that changes can be introduced more easily, as side-effects are shown straight away.

Of course, introducing testing into an EDD project is not easy. It will be running behind schedule, and people will be overburdened with work, so adding new tasks will not be doable. This doesn’t mean that testing shouldn’t be done though, just that it should be done in steps, rather than all at once. Find the core functionality, or alternatively the most problematic code, and introduce testing there – unit testing should come first, but don’t forget the other types of testing.

I know this is easier said than done, but I’ve been in projects solidly in the EDD category, which we managed to turn around, in part because of testing. In one project, we made the case for unit tests by making 10 unit tests of basic functionality in the system, showing eight of them failing. This resulted in me getting resources allocated to me, just to ensure proper unit testing of all basic functionality (we later expanded to other functionality and introduced other types of testing).

If such a drastic demonstration isn’t possible, start by doing unit tests whenever you change some code – this will ensure that the code works properly after you’ve changed it. Sadly, code in EDD projects are often not in a stage where unit tests can easily be introduced. This is why they should be introduced when the code is changed anyway, since it gives an opportunity for refactoring the code at hand to allow unit tests.

I hope this rather long post made sense to people. It’s not revolutionary concepts I’m trying to introduce, and for many people, the things I mention are blatantly obvious. Even so, there are many people, and organizations, out there, for which testing doesn’t come naturally. These people, and organizations, need to be reminded ever so often that there is a very good reason why we do these things.

Testing can’t stand alone of course; many other measures are needed to avoid project development to turn into EDD, or to turn a project away from being an EDD project. Still, they are fundamental for a healthy development, so leaving them out, will more or less guarantee that the project turn into EDD.

Book Review: The Pragmatic Programmer

The Pragmatic Programmer - from journeyman to master by Andrew Hunt and David Thomas (Addison-Wesley, 2000)

After having this book recommend several times, I got my work to buy it for the office. And I'm quite happy that I did that.

The goal of this book is to give programmers (or rather systems developers) a set if tips on how to become better, by becoming more pragmatic. In this, the book is quite successful.

When you've worked in the IT field for some years, as I have, you'll probably have heard most, or all, of the ideas before. Indeed, many of them are industry standards by now (e.g. using source control). Even so, it's good to have them all explained in one place, and it might remind people to actually do things the right way, instead of cutting corners, which will come back an haunt the project later.

If you're new to the field, I think this book is a must-read, especially if you're going to work in project-oriented environments (e.g. as a consultant). I'm certainly going to recommend that we get inexperienced new employees to read this book when they start.

Now, to the actual content of the book. It covers a lot of ground, not in depth, but well enough to give people a feel of the subject. The first two chapters ("A Pragmatic Philosophy" and "A Pragmatic Approach") explains the ideas and reasons behind being pragmatic, and how it applies to systems development. The next chapter ("The Basic Tools"), tells what tools are available and should be used. This is probably the most dated chapter, especially when it comes to the examples, but it's still possible to get the general idea.

Chapter 4 ("Pragmatic Paranoia") and 5 ("Bend, Or Break") deals with two areas where many people are too relaxed in my opinion: testing and coding defensively (ensuring valid input data etc.). I cannot recommend these two chapters too highly.

"While You Are Coding" explains how to code better, and (more importantly in my opinion) when and how to refactor. The last two chapters ("Before the Project" and "Pragmatic Projects") gives tips on how to set up and run projects in a pragmatic way.

There are of course tips that I disagree with, or which I would have put less emphasis on, and the book is obviously written before agile methods, like scrum, became widespread (though eXtreme Programming is mentioned). Still, even so, I can really recommend the book to everyone, novices and experienced developers alike.

Book Review: Release It!

Release It! - Design and Deploy Production Ready Software by Michael T. Nygard

If you are in the business of making software systems, odds are that you might have heard about Nygard's book. People have raved about it since it was published in 2007.

That being the case, it had been on my to-read list for a while, but without any urgency. Then I went to the JAOO conference last month, and heard two sessions with Michael Nygard presenting his ideas. After that, I knew I had to get hold of the book straight away.

Release It! is something as rare as a book which is groundbreaking while stating the obvious.

First of all, Nygard makes the simple point that we (meaning the people in the business) are all too focused on making our systems ready to pass QA's tests and not on making ready to go into production. This is hardly news, but it's the dirty little secret of the business. It's not something you're supposed to say out loud. Yet Nygard does that. And not only that, he dares to demand that we do better.

Having committed this heresy, he goes on to explain how we can go around doing that.

He does that in two ways. First he present us for the anti-patterns which will stop us from having a running system in production, and then he present us for the patterns which will make it possible to avoid them. Or, if it's not possible to avoid them, to minimize the damage caused by them.

That's another theme of Nygard's book. The insistence that the system will break, and the focus on implementing ways to do damage control and recovery.

The book is not only aimed at programmers, though they should certainly read it, it's also aimed at anyone else involved in the development, testing, configuration and deployment of the system at a technical level, including people involved in the planning of those tasks.

As people might have figured by now, I think the hype around the book has been highly warranted, and I think that any person involved in the field would do well to read the book.

Debugging friendly code

When you often take over other people’s code, you often start getting your own pet issue which you focus on. Well, my pet issue is debugging, or rather ease of debugging. I don’t want to have to know and understand the whole business domain, program or component when I want to fix an error (bug). Most of the time, it should be possible to fix errors by stepping through the code while debugging, and finding the error.

For this to be possible, however, requires the code to be debugging friendly. By this I mean that each class, method and even code line should have well defined responsibilities, which leaves no doubt where and how the error occurred.
This sounds all well and fine on the abstract plan, but how does it relate to real code? Well, that’s of course a little harder to say, but I can give some general guidelines which should be followed to achieve this.

  • Methods should be limited in scope. E.g. if you need to implement a method which fetches something from the database, the method should not also be responsible for putting the data in the cache or convert the data into a different data type. If those things are necessary, a method should be made for each of those functionalities.

  • Methods should be generalized as much as possible. Instead of copy and pasting methods and then modifying them to suit your needs, see if it isn’t possible to generalize the functionality in some way or other, so just one function is responsible for it.

  • Methods, parameters, classes, variables etc. should have telling names. Don’t pass x, y, z along as parameters in a method call. On the other hand, don’t make long names explaining the exact circumstances when it’s obvious from the scope what it is. The id of a customer object doesn’t need to get called customer Id.

  • Use local variables! Martin Fowler might disagree, but he probably doesn’t have to debug other peoples’ code very often. When you call methodX(methodY(Z)) it’s not possible to easily see whether it’s method or methodY which causes the null pointer exception.

  • Make unit tests for, at least, the critical methods.

  • Comment the code. Don’t explain the obvious, but rather focus on explaining the assumptions behind what you’re doing.

  • Check parameters for illegal values. If your parameter should never be null, then check for that – in case it is null, then throw an exception, explaining the problem. This shows other people (or a later you) that someone thought of the possibility, and didn’t just forget to handle null values as input parameters.

  • Ensure that your method behaves in a uniform way, i.e. giving the same parameters the method should always behave the same way (barring other dependencies). I once experienced a ToString() method in a class which always appended something at the end of the string, causing the behavior to different dependent on whether the method had been called before or not.

All of these things might seem simple, but when the project is 3 months over time, your customer or project leader (or both!) is breathing down your neck, then it’s easy to cut corners. You might also know the system and/or domain very well, which allows you to make some assumptions which are not obvious for others – not only does this make it harder for others to debug, but it also might cause people to use the code in the wrong way.

So, what happens if you come across code, or inherit code, which doesn’t conform to my guidelines? Well, be bold and refactor. Do it one step at the time – if method calls are used as parameters, make local variables. If methods are responsible for several things, split it into several methods with distinct responsibilities, and so on. And of course, make sure that there are unit tests.

Error-driven software development

When developing software systems, there are a number of systems development types out there, e.g. test-driven development (focuses on making tests before implementing), and what might be called requirements-driven development (focus on finding all the requirements before implementing). Unfortunately, there is a type of development that I all too frequently come across, which I've come to call error-driven development.

Error-driven development is systems development, where everything is done in reaction to errors. In other words, the development is reactive, rather than proactive, and everybody is working hard, just to keep the project afloat, without any real progress being made.

I should probably clarify, that I am not speaking about the bug fixing phases, which occurs in every project, but rather the cases where the project seems to be nothing but bug-fixing (or change-requests, which is to my eyes is a different sort of bug reports), without any real progress being made.

Unsurprisingly, this is not very satisfactory for any of the people involved. What's more, it's often caused by deep, underlying problems, where the errors are just symptoms. Until these underlying problems are found, the project will never get on the right track, and will end up becoming a death march.

The type of underlying problems, which can cause error-driven development, could be things like:

  • Different understanding of the requirements for the software among the people involved. Some times the people who make the requirements have an entirely different understanding of what the end system should be like than the end users.

  • Internal politics. Some departments or employees might have different agendas, which might lead to less than optimal working conditions.

  • Lack of domain knowledge among the people involved. If you're building e.g. a financial system, it helps if at least some of the people involved in the development have a basic idea of the domain you're working within.

  • Bad design. Some times early design decisions will haunt you for the rest of the project.

  • Unrealistic time constraints. If people don't have time to finish their things properly, they will need to spend more time on error fixing later.

There are of course many other candidates, and several of them can be in play at the same time, causing problems.

No matter what the underlying problems are, the fact is, that just focusing on fixing bugs and implementing change requests, won't help. Instead it's important to take a long hard look at the project, and see if the underlying problems can be found and addressed.

This seems trivial, but when you're in the middle of an error-driven development project, it's hard to step out and take an objective look at it. What's more, you might not be able to look objectively at the process. Often, it requires someone who hasn't been involved from the start, to come and look at things with fresh eyes.

As a consultant who often works on a time-material basis, I often get hired to work on error-driven development projects. The reason for this is simple: often it appears to the people involved, that the project just need a little more resources, so they can get over the hurdle of errors, and then it will be on the right track. When hired for such projects, I always try to see if there are some underlying problems which needs to be addressed, instead of just going ahead and fixing errors/implementing changes. Unsurprisingly there often are such problems.

Frequently these problems can be fixed fairly simply (reversing some old design decisions, expanding peoples' domain knowledge, get people to communicate better, implement a test strategy, use agile methods etc.), while at other times, they can't be fixed, only taken into consideration, allowing you to avoid the worst pitfalls.

So, my suggestion is, if you find yourself in a project which over time has turned into an error-driven development type project, try to take a long hard look at what has caused this, instead of just going ahead and try to fix all the errors/implement the changes. Error reports and change requests are just noisy symptoms in most cases, and will continue to appear as long as the real problems aren't addressed in one way or another.

A new blog

Welcome to my brand new blog.

Some of you might know my other blog Pro-science, where I write on a number of issues, including the subjects covered by this blog. That blog is not going away.

So why start a new blog? Well, I am planning to write a number of fairly long posts on subjects related to programming, systems development, and IT consulting, and the other blog didn't seem to be the right fit for this. Instead I decided to create a new blog focusing only on these issues. There will be a certain amount of cross-posting, with posts on general issues being posted both places, and more in-dept posts being posted here.

I'll copy some of the posts from my old blog to this blog, including the post that gave the name to this blog. Since google have a tendency to believe that intra-linking between two personal blogs is a symptom of a spam blog, I won't link back and forth.

Hope you find it interesting, and please feel free to leave comments. One note - if the post is more than two weeks old, comments will go moderation. This is unfortunately necessary to avoid spam.