The
first fallacy I covered was that of using very long identifiers. I
thought this debate had been settled a long time ago and it was agreed
that using identifiers that are too long makes code hard to read. I
only added it to the article because every now and then I see some code
that uses really long variable names, so I thought there must still be a
few people around who still think it is a good idea.
So I was very
surprised to get quite a bit of feedback objecting to this "fallacy".
It seems that the idea (and the idea that code can be self-describing)
is having a resurgence. For example, I have just been reading the book
"Clean Code" by Robert C. Martin which also seemed to say that really
long variable names are a good idea. So I think I need to explain this
in more detail for those who are still unconvinced.
Initial Problem
I guess the
problem all began because K&R and other books used example code with
very short names, often single characters. This is not necessarily a
bad style in that context since these examples are small and often
demonstrate abstract concepts.
The real
problem started when many C programmers emulated this style everywhere,
even in much larger and more practical software and for variables with
much greater scope. (In these situations longer and more descriptive
variable names should, of course, be used as they are more easily seen,
remembered and differentiated.)
The reaction to
this problem of use of poor variable names was to mandate (eg, in
coding standards) that variable names should fully describe the purpose
of the variable. This was an overreaction.
Self Describing Code
Another
contributing factor was that many authors have promoted the idea that
code should be self-describing, rendering the use of comments
unnecessary. First it is argued that well-written code should not need
any explanation. However, the main justification was really that
comments are sometimes out of date or just plain wrong.
The problems
with the idea of self-describing code are many. First, it is much
easier to describe many things in plain English than to contort the code
in an effort to make things clearer. Trying to make the code
self-describing actually can make it harder to read, at least for the
casual reader.
I will say that
comments on every line that simply reflect what the code is doing are
worse than useless, but comments at the start of blocks of code or for
every function can make it much easier to quickly understand what is
going on. In a large program occasional explanatory comments can be an
absolute godsend, allowing you to quickly hone in on what you are
looking for.
Also, just
because comments are often incorrect does not mean that comments are
inherently bad. Some comments are useless and can be removed but
sometimes the comments should simply be improved not removed. I often
think that the idea of self-describing code was invented by someone who
had just read too many bad comments.
Finally, the
truly unfortunate thing is that many programmers used the idea as
justification for not adding any comments to code. I have read a lot of
code in my time and invariably the worst code has no comments and the
best has several (or at least a few). Rightly or not, when I see code
with no comments I immediately assume it is of poor quality.
Clean Code
I have been
reading this book by Robert (Uncle Bob) Martin and have found many
useful ideas in it. The book actually has a whole chapter devoted to
naming of identifiers (Chapter 2: Meaningful Names) which I find
ridiculous in itself since most of the ideas presented are bleedingly
obvious and really not worth anyone reading let alone putting to paper.
However, what I really did not like was his ideas on the length of
identifiers.
Actually the
book seems to contradict itself. For example, on page 18 it says that
an identifier should describe "why it exists, what it does, and how it
is used", but on page 30 "Shorter names are generally better than longer
ones, so long as they are clear." I guess these statements are too
vague to be contradictory but they are at least confusing. Perhaps, his
code example will clarify...
int d;
int elapsedTimeInDays;
According to Uncle Bob the first line is bad but the second is good. I agree with that, but what about the obvious:
int days;
I actually
don't mind the name "elapsedTimeInDays" (though it is close to exceeding
my rule of thumb of a maximum of about 15 characters), but that would
depend on how it is used. If it is used many times within a few lines
of code a shorter name (like "days") will make the code much easier to
scan.
But what I really don't like is:
"If a name requires a comment, then the name does not reveal its intent." - page 18
"A long descriptive name is better than a long descriptive comment." - page 39
This brings us to my main point that it is generally best to use a shorter name and a comment when declaring a variable.
Long Names vs Comments
Another idea of
Uncle Bob's is that code should read like a good novel, rather than
some technical document. I agree completely. Taking the analogy even
further: declaring a variable in a program is like introducing a new
character in a novel. In a novel the author might take a paragraph or
two to introduce a new character, but thenceforth he or she would be
referred to by first or last name. In fact the main character of a
story would be referred to as "he" or "she" most of the time. The point
is that each time a character is mentioned, the author does not fully
describe that character again or even use their full name. (Of course,
characters should have names that are memorable and not similar to the
names of other characters in the story to avoid confusion.)
Similarly, in a
program a variable is introduced when it is declared. A comment should
appear at this point that fully describes the purpose of the variable
and how it is used. Thereafter you need to be able to refer to the
variable by a name that is meaningful enough that it is easily
remembered what it is for, but it must not be too long that it makes
reading the code tedious. Contrary to what Uncle Bob would have us
believe, the name should not try to include every interesting thing about the variable.
Compiler Limits
Originally C
compilers typically only supported identifiers that were different in
their first eight characters (and some linkers limited the length of
external variables to 6 characters). When the C standard lifted this
limit to 63 characters many programmers took this as meaning they should
use much longer names. However, the real reason the length was
increased was to alleviate problems with machine generated code.
DRY
A final point
is that having lots of information in an identifier contravenes a
fundamental principle of good software: DRY (don't repeat yourself).
Every time you use the long, overly descriptive identifier you are
repeating yourself. (Even Uncle Bob mentions the DRY principle.)
Conclusion
When you try
to put too much information into a variable name it makes the code hard
to read especially if that variable is used often. It is far better to
put all that information in one place - in a comment where the variable
is declared.
No comments:
Post a Comment