What is a Unit Test?
The term Unit Test was
once a synonym for module tests, referring to any test of an individual
unit or module. Originally done manually, such as using some test rig
to enter input and check output, then later by stepping through code in a
debugger. This was typically only performed once a module had been
implemented or an enhancement completed.
Nowadays, the term Unit Test
has evolved to the stricter definition of a comprehensive set of
automated tests of a module that can be run at any time to check that it
works.
The idea for what are now called Unit Tests has been around for many
years. I myself independently discovered them idea in 1985, giving it
the possibly more accurate but duller name of "automated module
regression tests" (AMRT). However, the use of Unit Tests has only really
started gaining momentum in the last decade or so since the rise of XP (Extreme Programming), and TDD (Test Driven Development).
Better Design
I talked a little about how Unit Tests help with software maintenance
last week, and I will summarize again below, but first I will explain
how Unit Tests can even assist with the original design and
implementation before we even consider changing it.
There is a lot of evidence that creating Unit Tests, at the same time as
you write the code to be tested, results in a much better initial
design. It helps the programmer to think about things (boundary
conditions, error conditions, etc) that are often ignored in the rush to
get something completed.
creating
Unit Tests ...
results in a
better
initial design
TDD (which is, in a way, an extension of Unit Testing) further improves the software quality, for example in verifiability (see Verifiability). It also assists in ensuring that the Unit Tests themselves are correct.
Another advantage is that creating Units Tests, while writing the code (eg, using TDD) means that bugs are found much earlier. This allows the design to be suitably modified before it becomes ossified.
Finally, the main advantage to the design is that Unit Tests make it
much easier to resist the urge to add unnecessary code. This is the idea
behind YAGNI in XP.
I could write a whole post on YAGNI (and probably will) and have also briefly talked about it in Reusability Futility. In brief, it is the idea that you do the absolute minimum to implement something.
Why are extra things done anyway? There are lots of reasons:
- a future extension is thought to be certain
- adding it now avoids later costs of change (see below)
- something is seen as a nice addition for the user and is trivial to add
- the developer wants to try something more challenging
- a more general problem is solved in the name of reusability, rather than the specific one required
- anticipated code degradation means it will be hard/impossible to add later
- distrust of code maintainers to later add a feature properly
- as an attempt to avoid repeated regression testing when things are later added
- finally (and ironically) changes are made to make it more maintainable
The consequences are:
- a more complex design, which is less easily understood
- undocumented features which are not debugged and tested properly
- when software is rewritten or just refactored, undocumented features are forgotten or inadvertently disabled - this can be very annoying for users
- reluctance to refactor from fear of breaking undocumented features
- can often constrain the design making future changes more difficult
- usually makes the code less maintainable
Note that there is one consequence that I have not touted here. I have seen it said (for example see the Wikipedia page for YAGNI)
that it is faster and requires less effort to create a simple design.
This may sometimes be true but in my experience it is often harder and
more time-consuming to create the simpler design than a more complex
one.
Unit Tests mean that most of the anticipated problems with YAGNI
disappear. You don't need to add the feature now because Unit Tests
allow you to add it easily later. That is they reduce the costs of change.
Moreover, the code does not degrade and maintainers are more likely to
make modifications in a way that does not subvert the original design,
because they don't have to fret over introducing new bugs, and good Unit Tests can even guide them how to make the changes,
Further, I have found that the use of Units Tests with a simple design makes it easier to maintain than a design which is explicitly built to be maintainable!
a simple design [is]...
easier to maintain
than a design...
built to be
maintainable!
Finally, Unit Tests free developers to concentrate on the task at hand,
rather than being distracted by tasks they are not really good at (like
predicting future user requirements).
Handling Change
In brief, the problem is that the costs of changing software are
traditionally very high, primarily due to the risk of introducing new
bugs (and the consequent problems) and the need for regression testing.
These costs cause two effects:
- resisting change, which is bad for the long-term viability of software
- emphasis on avoiding mistakes causing an unnecessarily large up-front effort
Resisting change can result in lost business opportunities simply by
avoiding adding new features. Further, the code is not refactored to
improve maintainability or take advantage of a better design. Further
still, improvements due to external technical advances (eg, improved
hardware or software libraries), are often missed.
Worst of all, even when changes are made they are not done properly for various reasons like:
1. It may be very hard to make the change in a way that is consistent
with the original design. A good example of this is given at the end of
Item 43 in Scott Meyers book Effective C++, where multiple inheritance is used to avoid making the correct changes to the class hierarchy.
when changes are made
they are not
done properly
2. The changes may affect other people. For example, a library or DLL
from another section may need to be changed. There can be a reluctance
to ask others to make changes that might appear to be for the sake of
one's own convenience. I gave an example of how this happened to me in a
previous post - Why Good Programs Go Bad.
3. Often making changes properly may increase the chance of new bugs
appearing (or old bugs reappearing). An example of bad code changes made
in the name of avoiding bugs, is the common practice of cloning a
function, or module, to handle a change, and making slight modifications
for the new circumstance. This preserves the original function or
module so that there is no chance of bugs being introduced in existing
behaviour; but the consequence is that there will be duplicate code,
which violates the DRY principle, and causes a maintenance problem.
4. Even a simple change has the possibility of introducing new bugs.
Hence manual regression testing is required. This can have a large cost
and is usually very tedious for those that do the testing.
Finally, the problem of spending an enormous effort up-front to get the specification right first time has
many well-known, undesirable consequences. First, it makes managers
very nervous when there appears to be a large cost (wages) with very
little tangible evidence that anything has been accomplished. The
analysts also soon realize that they don't really know what is required
and/or what they are doing and that there is no chance of getting the
analysis and design right first time. There is a large amount of effort which could be better spent in another way - the Agile way.
Agile methodologies generally help greatly to reduce the cost of change
by catching bugs (and other defects) much earlier. But Units Tests
especially enhance this aspect of Agile methodologies by:
- detecting bugs earlier (reducing the cost of delay)
- make code more maintainable (reducing cost of rework)
- allow changes to be made properly
- making refactoring easier and less risky
- work as a form of "living documentation" making changes easier (see below)
- even guiding developers to ensure that future code changes are done properly
Documentation
Technical documentation has always been one of the biggest problems in
software development. Studies have shown (at least in Waterfall-style
projects) that poor technical specifications are the major reason for
project failure, followed by poor estimations (which are usually due to
incomplete specs).
Before I talk about the relationship between Unit Tests and
documentation, let's just look at the problems and their causes. There
are a lot of problems with technical documentation which I group into
three rough categories:
1. Accurate. One problem is
that documents are simply incorrect, and the reviewing/proof-reading
required to remove all errors would be onerous. Of course, another major
problem is that they are incomplete, often with gaping holes, and no
amount of reviewing is going to find a gap that nobody has thought of.
One way to attempt to overcome this has been to require people with
authority to sign off on a document (presumably after having read it).
That way, at least you have somebody to blame when things go wrong!
Having more people read and sign-off is good (in one way) as it
increases the chance of spotting mistakes but then (in another way) the
blame-ability for each individual is diluted. And trying to blame is not
the Agile approach.
2. Up to date. One reason
documents are incorrect is because things change, usually a lot.
Documents, even if correct initially, are invariably never in accord
with the software at any particular point in time. One of the main
reasons, is finding time to update them. You don't want to keep
modifying something when you suspect it will change again in the near
future. The end result is you keep postponing updating the document
until it becomes irrelevant and everyone has forgotten about it, or the
task is so big that you can never find time.
One reason documents are not updated is that they are hard to verify. It
can be very difficult to check if there is a discrepancy between the
actual software and the documentation. Even though the document and the
code are very closely related there is no direct connection between,
except via the brains of the developers. This is another good example of
DIRE (Don't Isolate Related Entities).
3. Understandable. Finally,
documentation is difficult to read. There are many reasons, like being
too vague, including irrelevant information, bad grammar and incorrect
terminology, etc. A common problem is assumed knowledge on the part of
the reader - a good introduction/summary at the start of a document is
almost never done but can be a great time-saver and avoid a lot of
confusion.
The lack of a summary is symptomatic of the basic problem - the author
is entirely focussed on getting all the facts down. This is like what
happens with a newbie programmer - they are so focussed on getting the
software working (ie, correct) they give no regard to other important
attributes (like understandability). Unfortunately, document writers
rarely get past the "newbie" stage, being mainly concerned with
correctness not with the understandability of what they write.
Favour Working Software over Comprehensive Documentation
It is no mistake that this, the second of the four elements of the Agile Manifesto,
deals with documentation. And there is no better example of favouring
code over documentation than Unit Tests, since they are working software
which actually obviates the need for much documentation. On top of
their other advantages Unit Tests work as a form of living documentation which records not only how the code is supposed to work but also shows others how to use it.
Most documentation is poor, but even with the best you are never certain
you have understood it completely. Most programmers create a little
test program to check their understanding. With Units Test that little
test program is already done for you.
There are many other ways that Unit Tests work as an improved
documentation. By now, you probably get the gist so I will just create a
list:
- correct - unlike documentation mistakes are immediately obvious
- verifiable - you can easily check if Unit Tests are correct by running them
- understandable - working code is much easier to read/test than documentation
- modifiable - Unit Tests allow code to be more readily modified
- up to date - Unit Tests (if run and maintained) are never out of date
Of course, like documentation Unit Tests can be incomplete (and often
are). This is something I will talk about in the next blog, but for now I
will simply say that code coverage analysis can help to ensure that
tests are reasonably complete and up to date.
Finally, documentation can be thought of as a form of automation.
Automation is another thing I am keen on, as I hate performing tedious
tasks - and reading documentation to check that my code matches it is a
perfect example of a tedious task. Running Unit Tests automates this
task.
Organization
Finally, Unit Tests are good for organizing your work. You
develop a rhythm when using Units Tests (and particularly when using
TDD), of getting into a cycle of coding/testing/fixing. Somehow it
becomes more obvious what you need to do next. For example, after
implementing a feature you just "fix the next red light" until all Unit
Tests pass.
When writing Unit Tests you should always/never use White Box Testing. The correct version of this statement depends on what you mean by White Box Testing.
I have seen (and been involved in) this debate a lot. These are the opposing arguments:
1. You should never use white box testing as then you are testing the
implementation not the interface of the module. The tests should not
have to change when the implementation changes.
2. You should always use white box testing since you can't possibly test
every combination of inputs. Making use of knowledge of how the module
is implemented allows you to test inputs that are likely to give
incorrect results, such as boundary conditions.
there are different
interpretations...
of the opposite of
black box testing
Paradoxically, both of these arguments are valid. The problem comes down
to what is meant by white box testing. Unfortunately, the meaning has
never been clear since it is obviously just the opposite of black
box testing and everyone knows what black box testing means. The
problem is semantic -- there are different interpretations of what is
meant by the opposite of black box testing.
Here is my attempt to describe two different meanings of white box testing:
Meaning 1: Test that a module works
according to its (current) design by checking how it behaves internally
by reading private data, or intercepting private messages.
Meaning 2: Test a module using knowledge
of how it is implemented internally, but only ever testing through the
public interface.
Like cholesterol, there are both bad (Meaning 1) and good (Meaning 2) forms of white-box testing.
Bad White Box Testing
Recently a colleague came and asked me how a Unit Test should access
private data of the module under test. This question completely threw me
as I had never needed, or even thought, to do this. I mumbled something
about not really understanding why he wanted to.
Later, when I thought about it I realised that Unit Tests should not be
accessing private data or methods of the modules they are testing. Doing
this would make the unit test dependent on the internal details of the
module. Unit Tests should only test the external behaviour of a module
never how it is actually implemented.
As I said in my previous post (Unit Tests - What's so good about them?),
one of their main advantages is that they allow you to easily modify
and refactor code and then simply run the Unit Tests to ensure that
nothing has been broken. This advantage would be lost if the test
depends on the internal implementation of the module. Changing the
implementation (without changing the external behaviour) could break the
test. This is not what you want.
Unit Tests should only test the interface of the module and never access private parts.
Good White Box Testing
The whole point of Unit Tests is that they are written by the same
person(s) who wrote the code. The best tests are those that test areas
that are likely to cause problems (eg, internal boundary conditions) and
hence can only be written by someone with an intimate knowledge of the internal workings of the module.
For example, say you are writing a C++ class for infinite precision
unsigned integers and you want to create some Unit Tests for the "add"
operation (implemented using operator+). Some obvious tests would be:
assert(BigInt(0) + BigInt(0) == BigInt(0));
assert(BigInt(0) + BigInt(1) == BigInt(1));
assert(BigInt(1) + BigInt(1) == BigInt(2));
// etc
Of course, the critical part of a BigInt class
is that carries are performed correctly when an internal storage unit
overflows into the next significant unit. Doing simple black box testing
you can only guess at how the class is implemented. (8-bit integers,
32-bit integers, BCD or even just ASCII characters could be used to
store the numbers.) However, if one wrote the code one could know that,
for example, internally 32-bit integers are used, in which case this is a
good test:
// Check that (2^32-1) + 1 == 2^32
assert(BigInt::FromString("4294967295") + BigInt(1) ==
BigInt::FromString("4294967296"));
This test will immediately tell you if the code does not carry from the
lowest 32-bit integer properly. Obviously, you need other tests too, but
this tests a critical boundary condition.
To find the sort of defects that the above test checks for, using black
box testing, would require a loop that would probably take days to run.
(I will talk next month about why Unit Tests should not be slow.)
Which is The Accepted Definition?
The whole confusion about White Box Testing is that the distinction
above is not clear. Most definitions imply the "good" definition, but
then say something that contradicts this. For example, the Wikipedia page on White Box testing (as
at November 2013) does not make the distinction but seems to imply the
"good" definition, but then suggests it is talking about the "bad" form -
for example it says it is like like in-circuit testing which is the
hardware equivalent of accessing private data.
My general conclusion, talking to colleagues is that the "good"
definition is what most people think White Box Testing is or should be,
but there are some people that use the "bad" definition.
I don't really care which definition of white box testing you think is
correct as long as you distinguish between them, and as long as you
specifically use "good" white box testing with your unit tests.
Disadvantages of White Box Testing
At this point I will note a few problems.
First, with some code the distinction between interface and
implementation is blurred. This is a bad thing for many reasons, one of
which is you don't know if you are doing "bad" white box testing. See
Best Practice for Modules in C/C++ on how to separate interface from implementation. Also see the sections on Information Hiding and Decoupling at Software Design.
Also with white box testing it is easy to forget to add new tests when the implementation changes. For example, using the BigInt example
above, imagine that the code was enhanced to use 64-bit integers (eg,
because most users were moving to 64-bit processors where there would be
a large performance boost). If, after changing the implementation, the
test was not modified (or a new test not added) to check for overflow of
the 64-bit unit then the Unit Tests would not be complete.
Summary
Once you start using Unit Tests you keep finding more things to like about them. Your design is more likely to be correct and more verifiable. Further the initial design can be simpler because you don't have to try to guess what will happen in the future. And when things do change, enhancements are made more quickly and reliably and bugs are found faster. Best of all changes can be made properly and the code refactored without fear of introducing bugs.
However, the main point I was trying to get across is that without Unit Tests I don't believe an Agile methodology can work. (This is one of my criticisms of Scrum - that it does not prescribe Unit Tests as essential to the methodology.) Unit Tests allow the software to start out simple and evolve into what it needs to be. They allow us to resist trying to predict the future.
Conclusion
There are two points to this post.
1. Make sure you understand what someone means by white box testing.
When they say Unit Tests should not use white box testing then they are
probably talking about "bad" white box testing which accesses the
internals of the module under test.
It is pointless
to write Unit Tests
that only do
black box testing
2. When writing Unit Tests you should always use "good" white box
testing to test for different combinations of parameters, boundary
conditions and other likely problem areas.
It is pointless to write Unit Tests that simply do black box testing.
The whole point of Unit Tests is that they are written by whomever wrote
the code, in order to test likely problem areas.
No comments:
Post a Comment