Note that this post is not about how to use defensive programming. That has been covered in depth (for example in Code Complete by Steve McConnell). I am just attempting to more accurately describe what it is, why it was invented, and problems with it.
What It Is?
Defensive programming or defensive coding is a style of writing computer
software that attempts to be more resilient in the event of unexpected
behavior. This unexpected behavior is generally considered to be a
result of existing bugs in the software but could be due to other
problems such as corrupted data, hardware failures, or even bugs
introduced by later software changes. Generally, the code tries do do
the most sensible thing with little or no performance penalty and
without adding new error-conditions.
History
The first time I ever encountered the term "defensive programming" was
in K&R (The C Programming Language, 1st Edition, by Kernighan and
Ritchie). After extensive searching I can find no earlier references to
the term. The term probably derives from the term "defensive driving"
which came into common use in the early 1970's a few years before
K&R was written.
It is mentioned twice in the index of K&R. On page 53 it clearly
refers to making code resilient to bugs, but on page 56 it talks about
writing code in a way that reduces the likelihood of future code changes
introducing bugs. In any case many books since have used the term
"defensive programming" to mean making the code resilient in the
presence of bugs, for example The Pragmatic Programmer
by Andrew Hunt and Dave Thomas (which talks about "coding defensively"
in the chapter entitled "Pragmatic Paranoia"), and others. Even before
that many software professionals, myself included, have used the term in
this way since at least the mid-1980's.
Disagreement About Definition
Despite the term being fairly clearly understood for more than 20 years
the exact definition of the term has recently become blurred after
several (generally non-peer-reviewed) articles and blogs have appeared
on various web sites. For example, the current Wikipedia article, and
several sites that quote it, makes "defensive programming" sound like an
approach to error-handling. Error-handling can be related to
defensive programming but they are definitely not the same thing; and
one is not a subset of the other (see below).
Another, well-regarded and often referenced article, entitled simply Defensive Programming
has a very high ranking on Code Project. This is an excellent and
worthy article in its own right, but it is not just about defensive
programming. By its own admission it is about "... techniques useful in
catching programming errors ...". As we will see below defensive
programming has the opposite effect - it tends to hide errors not catch
them. This article discusses many things and should be more accurately
called something like "Good Coding Practices".
Error Handling vs Defensive Programming
The distinction between error-handling and defensive programming is not
very clear in the minds of many programmers. I will explain the
difference.
Error-handling detects and handles situations where something goes wrong
that you know is possible, however unlikely. In contrast, defensive
programming attempt to cater for problems that are assumed to be
"impossible". There are two problems with this distinction that can
cause confusion.
The first problem is that it can depend on circumstances whether
something is impossible or not. For example, if a function is private
to a module or program you may be able to ensure that it is always
passed valid arguments; but if it is part of a public library you cannot
be certain that it will never be passed bad data. In the first case
you can program defensively to ensure that the function does something
sensible even though you know it is "impossible" that this will happen.
In the latter case you might add error-handling in case bad data is
passed to the function.
So whether you choose to program defensively or add explicit
error-handling depends on the scope of the software that you control. I
discuss this further below under Scope.
The second problem is that there can be borderline cases where it is
debatable whether something should be considered impossible. Consider
this spectrum of scenarios for a hypothetical program that can be given
invalid data:
- The program accepts data directly from the user and the user may enter invalid data.
- The programs accepts data from a text file that has been typed in by a person.
- The program accepts data from an XML file (machine or manually generated).
- The program reads a binary data file which was created by another program.
- The program reads a binary file that was written by itself.
- The program reads a binary file that includes a CRC to check it has not been corrupted.
- The program reads a temporary binary file that it only created moments before.
- The program reads a memory mapped file that it created.
- The program reads from a local variable (ie, in memory) that it just wrote to.
At what point does invalid data become "impossible"? Personally, I
would say that it is "impossible" that a data file has become corrupted
and still generates the same CRC (see scenario 6). However, if security
of the data is important you have to consider that the file was
deliberately tampered with (in which case a cryptographic checksum such
as SHA1 should be used).
However, I know that a lot of software assumes that binary data files
are always valid (scenario 4 or 5). Much software will behave
erratically if binary data files have become corrupted.
Of course, I think anyone would agree that you have to assume that the
value of a local variable you just wrote to (see scenario 9) cannot
change. However, even in that case a hardware error, deliberate
tampering, or some other problem could change memory unexpectedly.
So it is not always clear when you need to have explicit error-handling code and when you should simply program defensively.
Example
The archetypal example of defensive programming occurs in just about
every C program ever written, where the terminating condition is written
as a test for inequality ( < ) rather than a test for non-equality (
!= ). For example, a typical loop is written like this:
size_t len = strlen(str);
for (i = 0; i < len; ++i)
result += evaluate(str[i]);
rather than this:
size_t len = strlen(str);
for (i = 0; i != len; ++i)
result += evaluate(str[i]);
Clearly both of these should do exactly the same thing since the
variable 'i' is only ever incremented and can never skip having the same
value as 'len'. So then why are loop termination conditions always
written in the first manner?
First, the consequences of the "impossible" condition are bad, probably
resulting in all sorts of undesirable consequences in production
software, such as an infinite loop or a memory access violation. The
"impossible" condition may occur for any number of reasons such as:
- bad hardware or a stray gamma ray photon means that one of the bits of 'i' is flipped randomly
- another errant process (in a system without hardware memory protection) or thread changes memory that does not belong to it
- bad supervisor level code (ie, the operating system or a device driver) changes memory
- the 'evaluate' function has a rogue pointer that changes the value of 'i'
- the 'evaluate' function corrupts the stack frame pointer and the location of 'i' is now at some random place on the stack
- later code changes introduce bugs, for example:
for (i = 0; i != len; ++i)
{
while (!isprint(str[i])) // bad code change means that 'i' may never be equal to 'len'
++i;
result += evaluate(str[i]);
}
Of course, the last few, caused by bugs in the software, are the most
common, which is why defensive programming is usually associated with
protecting against bugs.
Culture of C
There are also two other aspects of the C language that affect how and
when defensive programming is used - namely the emphasis on efficiency
and the approach to error handling.
Looking at efficiency first -- it is one of the fundamental premises of C
that it assumes the programmer knows what they are doing. The language
does not protect from possible mistakes, as other languages try to do.
For example, it is easy to write past the end of an array in C - but if
all array access had bounds checking applied (by the compiler) then it
would run more slowly even for perfectly safe code.
Due to this emphasis on efficiency, defensive programming is only used
when it has little or no performance penalty. This is typified in the
above example since a "less than" operation is normally just as fast as a
"not equal" one.
The other aspect is the approach to error-handling in C. Errors in C
are generally handled by using error return values. It is not unusual
for C code to be dominated by error-handling, so error-conditions are
ignored if they are considered unlikely to occur - eg, nobody ever
checks the error return value from printf(). (In fact, error return
values are often ignored when they should not be, but that is for
another discussion.)
So, if "unlikely" errors are not generally handled it makes no sense for
"impossible" conditions to be handled as errors since this would add to
the existing error handling burden. (This is covered in more detail in
item 10 of my Code Project article at http://www.codeproject.com/Articles/357065/Ten-Fallacies-of-Good-C-Code.)
Of course, in languages with exception handling, many such
"impossible" conditions can be easily handled by throwing a "software
exception".
Scope
A lot of the confusion about defensive programming comes about because
the scope of control is not always clearly defined. For example, if you
have a function that takes a string (const char *) parameter you may
want to assume that you are never passed a NULL pointer if it never
makes sense to do so. If it is a private function you may be able to
always ensure it; but if it's use is outside the scope of your control
then you can't assume that unless you clearly document that a NULL
pointer may not be used.
In any case even if you consider the condition to be impossible it is
wise to allow for the possibility using defensive programming. Many
functions do this by simply returning if unexpectedly passed a NULL
pointer. (Again, note that this is different to error-handling since no
error value is generated.)
So any discussion of defensive programming must clearly define the scope
of the code being considered. This is one problem with the Wikipedia
article on defensive programming.
Symptoms
When using buggy software the symptoms of defensive programming are seen
often (but may be dismissed as operator error). I think everyone has
at some time seen software that did something a little strange, like
flash a window, ignore a command, or even display a message about an
"unknown error". Usually this is caused by a bug which caused a problem
from which the software attempted to recover.
This recovery can sometimes be successful but usually results in the
program limping along. In the worst case it can silently cause massive
problems like data loss or corruption. (After seeing something like
this, I generally save my data and restart the software to ensure it is
not in some weird state.)
Problems with Defensive Programming
By now it must be pretty clear that defensive programming has a major problem. It hides the presence of bugs.
Some people may think it is good to hide bugs. Certainly, for released
software in use, you don't want to force the user to deal with a problem
that they do not understand. On the other hand blindly continuing when
something be broken can be dangerous. Also, some attempt should be
made to notify someone of the problem - at least write an error message
to a log file.
What is worse, though, is that defensive coding has been known to hide
bugs during development and testing. Nobody can argue that this is a
good thing. The alternative is to use what has been called "offensive
programming" and sometimes "fail fast". This means to make sure someone
knows about problems rather than hiding them.
I do use defensive programming so that unexpected or impossible
situations are handled in release builds; but add assertions that check
for the impossible situations so that bugs do not sneak through. I also
do most testing use the debug build (so that assertions are used),
except for final acceptance testing. For some critical things I also
explicitly add error-handling code, since assertions are removed in
release builds.
Standard C Library
Here are two more examples of how defensive programming is used, taken from the standard C library.
A nasty problem that occurs in far too many C programs is caused by
buffer overruns. This mostly happens when copying or building a string
and the size of the the output buffer is exceeded. In the name of
defensive programming it is recommended to use string functions that
take a buffer length (strncpy(), strncat(), snprintf(), etc). This
avoids the buffer overrun, but hides the (possible) problem that the
string was truncated.
Reports often require data nicely formatted into columns. This is
usually achieved in C using the minimum field width of printf()
format-specifiers. For example, to print numbers in a column five
characters wide you would use the "%5d" specifier. If the integer is
too big for the field then C just prints the extra characters anyway
even though this will ruin your columns. (Contrast this with other
languages like Fortran where field overflow results in silent truncation
of numbers, which has caused some very nasty problems.) This is an
example of defensive programming since when presented with an unexpected
situation the code tries to do something sensible.
Exercise
Finally, here is something for you to think about. The standard C
library includes a function that takes a string of digits and returns an
integer called atoi.
If you are not familiar with atoi(), it does not return any error code
but stops when it encounters the first unexpected character. As an
example atoi("two") just returns zero.
Is the behavior of atoi() an example of defensive programming? Why?
How could it be improved?
No comments:
Post a Comment