The Art of Software Development: Eliminating Bugs

by Bruce Gilham (Please let me know if you know where I can link to the original, to give due credit)

Rule #1: If the user says it's a bug, it's a bug!
Rule #2: The harder a bug is to detect, the more the users will object to the system.
- Rule #2a: Anything that you are unsure about should be visible to the users.
Rule #3: The key to successful implementation is maintainability and testing.
- Rule #3a: No programmer can test his own code.
- Rule #3b: Test for success
Rule #4: Analysis is cheaper than design; design is cheaper than programming.
Rule #5: The analysis is done when all the parts of the system, extant and proposed, are named and have a defined purpose and responsible party or parties.
Rule #6: The end of the design phase occurs when all the users have no significant questions or objections.
Rule #7: It is almost impossible to write a system for an organization engaged in criminal activities.
Rule #8: Every uncertainty about a system doubles the complexity of the affected feature.
Rule #9: The most important thing about a system is the most important thing about a system.
Summary

There is a great satisfaction in seeing one’s ideas put to use, and there is equally great disappointment when a design never sees the light of day. The art of software development goes way beyond writing good code. It means writing the right code, at the right time, for the right user, with the right documentation.

There are a million ways to mess up the development process. No set of rules is perfect. I wrote my first "custom" application while a hardware tech at Memorex. Since then I have written dozens of applications, many of which are still in operation. My largest is a production control system that tracks 200 million dollars per year in PC board production. My smallest might be FastWrap, a program that summarizes commercial production expenses for advertisers such as Toyota and Pepsi.

A few years ago I started a list of what amounts to all the mistakes I made as a software developer. I say this to avoid any accusations that I might consider myself to be an authority on "how to write computer programs." If anything I am an authority on how not to develop computer systems. Every one of the rules in this and the following articles was discovered in the school of hard knocks. Each one is on the list because violating it got me in trouble.

Rule #1: If the user says it's a bug, it's a bug. (top)

This is obvious when selling an off-the-shelf product, but not so obvious for in-house or custom projects. This rule becomes impossible only when there are warring factions within the company.

This almost always comes in the form of the software doing something unexpected. One could say that all undocumented features are bugs. This is perhaps an exaggeration. Or perhaps not.

When the software does something unexpected, it upsets the user. Always! It also makes enemies for the MIS department or consultant. What users want from a computer is predictability. They want it to do the "same thing" every time. By "same" they really mean "What I expect."

Users are quite happy to work around known bugs, if they understand the bug and decide that they would rather work around it than deal with the cost and/or trouble of fixing it. For instance, if they know that parts of a report are OK then they may be willing to overlook obvious flaws.

I discovered this rule when I replaced a consultant who had been fired simply because he argued about what was and was not a bug. This was a ridiculous situation since the consultant was paid by the hour; why didn’t he just fix the bug?

In fact, he should have been delighted! Knowing what is a bug is half the problem because…

Rule #2: The harder a bug is to detect, the more the users will object to the system. (top)

You really only have to let the users down a few times and they will distrust the system forever. This makes the introduction period critical. Users should be clearly warned when the system is unstable. There is actually a scale of system quality: a) behaves exactly as expected b) usually behaves as expected, c) does consistently weird things, d) does unexpected weird things.

Sometimes developers have to guess at what the user really wants or how the compiler will really translate the source code. It is a good idea to "surface" such issues and make sure that the suspect data is clearly visible to the user.

One of the smartest systems I wrote was for a debt collection law firm. The program calculated interest on default accounts. The rates and even the rules changed depending on various actions in the case. The idea was to calculate the account balance automatically. And correctly. It wasn’t until I surfaced every calculation on a single screen that we were able to get a perfect answer. As long as I hid the actual calculations, there were endless revisions. This single screen saved the client thousands of dollars in reprogramming fees. It was so impressive that they used it to close their clients.

Therefore, you have the corollary…

Rule #2a: Anything that you are unsure about should be visible to the users. (top)

Of course, we developers want to look good. Of course, we want to appear omniscient. The truth is that we can’t be expected to read user’s minds; but we can do a pretty good imitation thereof. The trick is to surface everything that is uncertain.

In my example above, no one had the total answer. It was only when we saw wrong results that we knew to look for bugs. Once the calculations were surfaced, it was short work to fine tune the program. We manually redid the calculations in chronological sequence until the difference appeared. Then we had the bugged calculation.

What was interesting was that the users appreciated having the calculations to hand and actually preferred to have the extra data in front of them. By surfacing what I, the programmer, didn’t understand, the users gained comprehension about something that they didn’t fully understand either.

Besides the areas of uncertainties that come from users, there are the areas of platform unknowns. Who has grasped every nuance of the Clarion compiler? Never mind a really complex and disorganized platform like Windows. Modern programs are too large and complex for any single dweeb-superprogrammer to grasp in its entirety, much less for us overworked applications programmers.

Top Speed and other manufacturers do their best to document their products, but the average programmer will inevitably have gaps, either from poor documentation or simply from not reading the documentation to hand.

So, surfacing uncertain data is beneficial because…

Rule #3: The key to successful implementation is maintainability and testing. (top)

This has two immediate corollaries…

Rule #3a: No programmer can test his own code. (top)

This corollary should be obvious, but is usually overlooked. The quality of a system comes from the testing, not from the programming. Maintainability comes from programming skill.

With adequate testing, any programmer's work can be made usable. The test of the programmer comes when someone else has to alter his code. Or even when he has to alter his own code. It is a brutal statement to note that many times old programs are simply discarded as unmaintainable.

Testing should be done by people who will use the system. The more testing, the better the system, period. Nothing, absolutely nothing, guarantees a usable system like testing. With this, of course, are reports of what was found.

Rule #3b: Test for success (top)

The most common failure by testers is to a) test to the first error and quit and b) report only failures. The first is simply a waste of time and is an indication that the user really doesn't want the proposed system. Even if there are 20 things to test in a program and the tester only gets access to three of them, the project will advance much faster if he reports on the three that work.

If an area works according to spec, it should be so reported. Failure to do so invites changes. There are two reasons for implementation bugs: either the programmer misunderstood the specifications or the programmer misunderstood how the platform interprets his instruction (assuming that the system was not sabotaged, or the programmer interrupted in the middle of a critical bit of code).

I have seen programmers waste precious time fiddling with perfectly acceptable parts of a system because no one told them that it was acceptable. And I’ve seen programmers simply decide to change something, a disease from which none of us is totally immune.

It is especially important for a tester to note features that work better than anticipated as these are probably fortuitous errors and might be eliminated in the next version. In this case, the specifications need to be changed to match the reality of the working program.

When I started writing business applications 15 years ago, my intention was to use computers to make organizations more sane. The way to accomplish that is to make the computers more comprehensible and take out the mystery. Computers are tremendously powerful administrative tools. Misused they can drive a work force into apathy and confusion. Well understood, and with a minimum of care in the design and implementation of the software, they can easily be credited with contributing to phenomenal profits. Ironically, many computer users want the computer to "think" for them. I am reminded of the joke that recently came over the Internet.

IBM’s founder Tom Watson had to be dragged into the computer age, but once committed to the electronic thinking machine he went all the way. He dictated that the company motto be: THINK. Apparently, a manager put up a sign over the sink in the bathroom: THINK. In short order a second sign appeared: THOAP.

Thank goodness we have outgrown the idea that computers are going to do our thinking for us. Just because they are such stupid machines, it takes a lot of intelligence to program a computer.

Rule #4: Analysis is cheaper than design; design is cheaper than programming. (top)

The very worst decision a company can make is to speed up a project by skipping the analysis phase or design phase. Prototypes not withstanding, the answer to any complaints about late delivery is more design, not more programming. The solution to slow or endless design is more analysis.

Analysis and design consist mostly of meetings with users and compilation of notes. Programming probably contributes less to the process of design than does playing video games. At least the time spent playing video games won't send the project off in the wrong direction. Analysis meetings are mostly with managers and some key users. Design meetings are usually with working managers and the more knowledgeable users. Seat-of-the-pants programming is not only non-productive, it can actually be counter-productive.

Analysis means breaking the project up into parts and looking at the relationships of those parts. The key activity in analysis is naming the parts of the system! Any part of a system that is not named, or that is given a confusing or inappropriate name, is bound to slow things down latter on.

I recall a day-long meeting to sort out the meaning of "due date." At the end of the meeting we realized that due date was not the same thing for sales, shipping and production control. So every time we "fixed" due dates we got new complaints! This could have been completely avoided if we had investigated the various meanings of due dates in the beginning.

Almost all human communication is with words. Anyone without a vocabulary is isolated. It can be demonstrated conclusively that a student, in passing over a single misunderstood or not-understood word, suffers immediate physiological changes. For example, the Learning Accelerator from Applied Scholastics monitors a student’s non-comprehension by metering changes in the body’s electrical resistance. Variances are instant and significant when a student passes over a word he or she does not understand.

There are mysterious words of all flavors. For example, I ask you to put a dollop of salsa on my taco, and not understanding dollop you put a wallop, and so I figure that you have it in for me. Or I tell you to "run up the temperature" of an oven and so you back across the room and run up to the machine to change the temperature.

Laugh if you will, but mysterious and misunderstood words are the bane of any programming project.

Rule #5: The analysis is done when all the parts of the system, extant and proposed, are named and have a defined purpose and responsible party or parties. (top)

The analysis document should include what kind of thing it is (program, report, screen, procedure, feature, business rule, database, field, etc.) and who (plural) has authority over it. This will usually be the most senior user. It also shows the purpose of the item in the context of the business.

The list must be complete and contain everything that could possibly be included in the system, immediately or in the future. While this sounds as if it would take a long time, remember that all you are interested in at this time is the name, purpose and authority. The decision to include it, delay it or abandon it comes later.

Finally, the analysis includes any imperative design issues. These are often expressed in fairly general terms, such as the requirement that an accounting system must be accurate. While a finicky or ugly accounting system would not be popular, and there are circumstances when it would be acceptable, an inaccurate one is never acceptable.

Also a glossary of terms is a good idea. If this is missing, any ambiguous names or terms must be defined in the text.

You might think of the analysis document as a shopping list.

Note: Some developers expand the analysis stage to include all the questions that need to be asked of the user. In contrast, I assume that the users will have input at all stages.

The analysis phase naturally leads into the design phase. From a user, or system specifier's viewpoint, there is little difference between analysis and design; in both cases they still answer a lot of questions.

- From the developer's viewpoint design is a new ball game. The analysis is general and all the parts of the system must be considered. In the design stage, if the analysis was adequate, the system is broken up into pieces and the focus can be on one or another of these pieces, not the whole system. When the developer finds excessive interaction between the parts of the system then more analysis is needed.

But what if the users refuse to invest the time for a full analysis? In this case the programmer is advised to covertly execute the analysis anyway. It’s now a matter of doing so a bit at a time, or do what analysis that you can, then work out some sort of design (even if terrible) then finally write a "draft of the program." Tell the users that it is the draft version.

When you release the draft version of the software, and the quality is there, the users will answer the original analysis question. Never mind that they act like innocent victims of the big-bad-programmer. Simply point out that you’re not a mind reader, and insist (politely) that they share with you the names of the parts of the system. This is a piecemeal approach, but it works!

With well-grooved clients the analysis phase is fast and exciting. Remember that everything worthwhile starts with a decision.

Rule #6: The end of the design phase occurs when all the users have no significant questions or objections. (top)

By significant I mean an issue that affects the data structure or business rules. In this day of web sites, it also means anything that would affect a majority of the pages.

What usually happens at the end of the design phase is that the users start changing things back to a recent version or all the changes they request are cosmetic. Only at this point is it really safe to invest in programming time. Prior to this any programming is prototyping or proof-of-concept and may not only be worthless, but might well be misleading.

All of us have come up with a great idea that proved impossible to implement. If the feature was promised but never seen, the disappointment is a lot less sharp then when a feature is "tested" but dropped. The answer is to use some intelligence when making a demo or proof-of-concept program and stick to tried and true features!

The largest danger comes from believing features lists for programming tools, or perhaps more seriously, the implied features. "Fast" sometimes means "hard to use" or "starved for features." "Advanced features" probably means "buggy." And so forth. It’s one thing to kid the end user, but something completely different to mislead a programmer who makes his living form the tools he purchases. I’m not saying that you shouldn’t use fast or feature-rich programming tools, only that you must know how the tools work in production before betting the project on them.

In general the design is done on paper and includes pictures of the screens and reports that will be supported. Where a screen is required to perform a complex operation a prototype is appropriate to test user reaction. Usually, but not always, a prototype is a substitute for adequate analysis. Only the most sophisticated of users can exhaust the utility of the simplest of screens.

Of course, users naturally have their brightest ideas when they see the actual product. This differs from an almost compulsive need to change things, change things, change things. A lot of learning takes place in the design phase. It might as well be called the education phase, as that is as important as having a final document.

When the client realizes that the program is "just what we asked for but not what we really wanted" the analysis and design need to be redone. At this point the compulsion to change is even stronger. If you don’t redo the design phase, then when they see the program their immediate reaction, even if not voiced, is to want to change it, even though you already have. This can actually spin the project out of control with endless changes and bright ideas. Nothing reassures a client more than seeing his, or her, ideas on paper. The closer to his, or her, original concept the product is, the more reassured the client will be.

One of the most complex designs that I ever encountered was for a retail store. We wrote the system three times, and it was rejected three times. After working on site to get the total picture, I discovered that the client was in serious trouble for evading sales tax. Everything then became clear. They wanted to system designed in such a way that they could cheat on their taxes, but they were careful not to reveal the fact. It doesn’t matter what I would have done had I known that they were cooking the books, but they were afraid of my finding out.

This brings up the next rule:

Rule #7: It is almost impossible to write a system for an organization engaged in criminal activities. (top)

The reader might think that this is obvious, but when you are deep in the design and nothing aligns, you automatically wonder what kind of criminal or unethical actions the client is trying to hide. A great number of companies have their dirty little secrets, but when the effort to hide the misdeeds outweighs the desire to have a clean and effective computer system, then the programmer is in trouble. Most likely he will be blamed for everything and anything. He will be responsible for the current business conditions, the pot holes in the parking lot and the flicker on the monitor screen. Worst of all, the client will delay or refuse to pay for your hard work!

At this point it is vital to realize that it is not your bad performance, but the client’s efforts to keep anyone from "finding out." If you have done your level best and find yourself a target, your first action would be to find out what it is they think you know.

Once I discovered the retail store’s dirty little secret, they calmed down and we finished the system in a few week. And this was a system that had been through three major rewrites in nine months. Luckily I was not confronted by a moral dilemma—the sales tax people were on to them with a quarter million dollar fine. Good design would have revealed the true situation.

A less extreme cause of delays is that:

Rule #8: Every uncertainty about a system doubles the complexity of the affected feature. (top)

The programmer depends on the certainty of the specifications. Changes to the specifications, no matter how "insignificant" or "justified," will exponentially increase the development time. It can be a matter of pride among programmers as to how many changes they can tolerate before they lose it.

One could define a computer system as the collection of uncertainties of an organization, the maybes of the operation. You have an invoicing program because you don't know the details of the next invoice. If all invoices went to the same address for the same amount, then an invoicing program would be pointless. The "size" of the program expands rapidly with the so-called flexibility. A few well-chosen executive decisions can cut down the complexity of a bogged system and put it back on the time line. Unfortunately, decisions that are later reversed are worse then no decision in the first place, therefore such decisions are best made in the analysis phase.

Note that maintainability is not the same as changeability. Maintainability means "the ease of making changes to the system that parallel the changes in the operation of the business." These are usually minor in scope and hopefully the impact on the system is evaluated before the change is made. It is the difference between redecorating your apartment and changing the floor plan after the foundation is poured.

A changeable program would easily respond to every client whim, which is pretty much an impossibility. Writing a really changeable program has been described as "nailing Jell-O to a tree."

This is perhaps a good place to comment on "positive" thinking. I often encounter programmers who go on hoping that "something" will solve design and analyses oversights. This sort of sloppy thinking just leads to more and more delays.

Positive thinking works better when positive is interpreted to mean definite, not vague or sloppy. A user who is definite, that is, positive about system requirements is a godsend. Frankly, I run like heck if a user gets the attitude of: "Just make it work."

A program that I wrote some seven odd years ago is still in use. It is a simple little program that totals the money spent on film production at the end of a shoot. Mine was one of three versions, and the only one still in use. The client is a great guy, but when we reached what should have been the end of the project, it seemed to stretch out endlessly. Then I noticed that my client had picked up an encouraging attitude, but had quit giving specific instructions. When I pointed this out, he gave me a stream of very definite requirements and we wrapped up the project in short order.

Rule #9: The most important thing about a system is the most important thing about a system. (top)

I took over a project to write a billing system for an accounting type organization. The programmer had developed a wonderful system to track client folders, but overlooked the fact that the number one priority for the customer was to straighten out confusions in billing.

It is important to prioritize features as this helps when planning delivery schedules and in estimating how much attention each function deserves. It is even more important to discover, announce and verify the one thing that will make or break the system. This single item will usually get attention far in excess of all the remaining parts of the system.

For a docket system in a law office it would be to have all the calendar events announced at the right time. If the calendar report is accurate and reliable then the system will be judged as "essentially" reliable, even if the mailing list feature is a disaster.

Summary (top)

The common denominator to successful systems, one that is uniformly absent from failured projects, is a clear understanding of the system and its contents. A badly written system that limps along losing data will be more useful, if well understood, than one that is smooth and flawless but mysterious to the user.

When I started writing business applications 15 years ago, my intention was to use computers to make organizations more sane. Ironically, the way to accomplish that is to make the computers more comprehensible and take out the mystery or, in other words, to make computers more sane. Computers are tremendously powerful administratively. Misused they can drive a work force into apathy and confusion. Well understood, and with a minimum of care in the design and implementation of the software, they can dramatically grow the very same organization.

Jump to top


	Search WWW toomuchblue.com microsoft.com distributed.net

This site has no webmaster. Please contact the pagewrangler instead.