A screenshot showing a README in a project folder

Documentation as a Bug-Finding Tool

Every developer should already know about many of the benefits of writing good documentation for the future developers of whatever it is that they’re working on.

The documentation helps these future developers gain a real understanding of how the code works, why it does what it does, and prevents them from introducing new bugs in the future. A good overview of the inner workings of a program could save a developer hours of work attempting to figure out just how all of the pieces fit together. And even if you don’t think that anyone other than you will ever be looking at the code ever again, a good overview could certainly serve as an excellent refresher for you in the future, if you ever have to refer to the code again.

However, there’s one key benefit of writing this developer-focused documentation that I feel is often underplayed. Writing this documentation will help you find little bugs that would have otherwise been missed.

When I refer to “documentation” here, I don’t really mean the little scattered comments throughout the codebase. Those are important, but I desperately hope that enough developers already know about their importance that it isn’t a huge issue. I’m instead referring to more general architecture or code structure documentation. The type of documentation that tells you how all of the different moving parts fit together, and just as importantly, why they fit together the way that they do. This is the type of stuff that can be almost impossible to figure out without good, proper documentation, and this documentation is what can best help you find bugs.

Take, for example, the method from ClassA below:

public int getSomething(String key, int num) {
ClassB obj = hashMap.get(key);

synchronized(obj) {
obj.doA(num);
return obj.getB();
}
}

There is one fairly simple bug here that should be realized with or without documentation. What happens if obj is null? NullPointerException. Right.

I actually wrote code with a similar bug earlier this week, which certainly would have been caught with further testing, but I caught during this documentation phase. It saved me a bit of personal embarrassment to have caught this simple bug before it left my machine, so I count this as a point for documentation.

However, there’s one other little and evasive bug that I may not have caught without proper documentation. Let’s take a look at ClassB’s two methods to see if you can catch it:

public void doA(int num) {
for(int i = 0; i < array.length - 1; i++) {
array[i] = array[i+1];
}

array[array.length - 1] = num;
}

public float getB() {
float total = 0;
for(int num : array) {
total += num;
}

return total / array.length;
}

Both of these methods are pretty simple. doA(…) treats “array” as a buffer of sorts, and appends the number passed to the buffer. getB() returns the average of the values in the buffer.

These methods should only ever really be needed when called from ClassA as shown above. However, I didn’t put in any guarantees that they wouldn’t be called from elsewhere, and there’s no concurrency protection on the methods themselves. If you recall, above, the group in ClassA was surrounded with a synchronized block (because they do need to be called in succession with nothing in-between), but the methods themselves have no protection. This is an issue that I likely only caught because of the documentation I was writing:

ClassA contains the only object that should be accessed by multiple threads, and is thread-safe.

As soon as I typed that, I realized my error. “Yes, ClassA is the only one that should be accessed by multiple threads,” I thought, “but that doesn’t mean that it will be the only one accessed by multiple threads. Something has to be done.”

And yes, with the current code that I had written, there would have been no issue. Everything would have been perfectly thread-safe. But what happens in the future? What happens when some other programmer comes along a few months from now, and goes, “Oh boy! ClassB is just perfect for this application!” and uses it, assuming that it must be thread-safe since it’s used by ClassA, which, due to its purpose, must be thread-safe.

And this is just for a fairly simple program! With larger programs, finding small bugs like these before they become an issue becomes even more difficult, and you need to use every tool in your arsenal to prevent these bugs from becoming an issue. Combining this type of documentation with your other means of defense, such as code reviews, will not only make your team love you more, but will help you churn out the high-quality code you dream of.

Eric Jeney is a Java aficionado with a number of projects under his belt. He enjoys programming for the web, and writing about his experiences. He currently studies Computer Science at the University of Maryland.

View all posts by Eric →

  • Memmaina

    good post

  • Thomas Eyde

    I disagree. A little.

    First of all, comments scattered around the code base are not important. Most often than not, they are out of sync and plain wrong. But that’s not why I disagree.

    What we need are well-written documentation of the business needs. What do they want our code to actually do?

    It doesn’t help if my code documentation is correct, if my code does the wrong thing.

    • http://madebyknight.com/ Eric Jeney

      I think you need some of both, really, and a bit of documentation on the architecture of the code could handle both pretty well.  You can both describe what your code is trying to do and how it’s doing it at roughly the same time.

    • Jacob Binstein

      Hopefully, if you’re serious about documentation, it’ll never get out of sync or wrong. As you change methods or interactions, you should be changing your documentation as well. I personally have found that to be a great way to figure out how old projects that I’ve since forgotten work

    • Arved Sandstrom

      Apples and oranges. There are different forms of documentation – for business, for IT managers, for analysts and architects, for developers, for testers – and they are all important. Sure we need good business and technical requirements documents, but that’s not what we’re talking about here.

      Comments in the code base are important, provided that the comments follow accepted practices. There is no better place to put comments about classes and methods and functions than right where the code is. That’s where another developer will see it.

      If the organization is lax then of course code comments may be out of sync. So will every other form of documentation be. But as a rule of thumb the code comments are likely to be the best maintained regardless.

  • Stasys Puškorius

    I think there should be two versions of documentation always, one for business needs and other one for code. Of course from code comments later you can generate dynamic documentation with one of the tools that are available for this.
    Usually good software engineer think’s two steps in advance, so there always will be a little additional code, that can be used in a future. And to document that is a good idea too. 

    For me it is one of the best things when you find, that person who worked before you was not a moron who wrote code without thinking. And it is the good day when that part has a clear comments about what it does and what it was meant to do.

    • http://madebyknight.com/ Eric Jeney

      Right.  There are definitely several types of documentation that are necessary in most scenarios.  The developer documentation is just the one that helps you spot bugs.

      Part of the advantage of clear documentation is that it makes you stop and think for a second about what you’re really doing and whether or not it’s the correct way to do it.

      • Anonymous

        “it makes you stop and think for a second about what you’re really doing and whether or not it’s the correct way to do it.”

        Why is this not being done before you write any code? You seem to be advocating an approach of write some code then document it afterwards?

        • http://madebyknight.com/ Eric Jeney

          Sort of.  I do tend to document after I write the code, but that doesn’t mean I don’t put a lot of thought into it before I write it.  For example, the silly check to make sure that the object returned by the HashMap wasn’t null that I had originally missed is not something that I would have been likely to document before actually writing the code, but it is something that I caught while documenting afterwards.  This stage of documentation just forced me to re-think through the code I had already written.

  • http://twitter.com/Paul_Hadfield Paul Hadfield

    How about document the architecture, test the code (design).  If there was a way to “test” the architecture I’d no longer say to document that either.  I’ve never really met any developer that takes the time to write, update or even read the documentation!  It just doesn’t seem to fit into our nature.  Any documentation I’ve taken the time to read (I’ve admitted it too) is confusing or wrong because it is out of date.  A nice big thick spec or documentation also doesn’t tell me if I’ve broken anything else when making a change – but a test does.   Write clean and easily understandable code and write tests that add value – add a test when ever a bug is found to make sure it can’t happen again.

    • http://madebyknight.com/ Eric Jeney

      I never really intended for the documentation to be very long or thorough.  Just something that gives a very basic idea so that when you first look at the code base you’re not stuck in the dark.

      I think part of the reason that so many developers don’t look at this documentation is that it’s often so poorly done, or nonexistent.  But, stopping and taking the time to document doesn’t only help the future developer, but it also can help you, like I said in the article.

      Documentation isn’t meant to replace tests, it’s meant to supplement them.  It’s hard to write a test for a weird bug you haven’t found, but documenting may help you find the bug before it goes live.

  • Michael

    You probably wrote what you did (I hope) for speed in this article.  But first and foremost is that if your method names and property names are explanatory, you embed the documentation within your code itself.  ClassA and ClassB would violate coding standards.  doSomething would be cause for being taken to a field and executed.  :)  GetSynchronizedHashcode, however, tells me exactly what your doSomething is really doing without any documentation.

    • http://madebyknight.com/ Eric Jeney

      Yes, don’t worry, those weren’t the actual method names and class names.  Unfortunately, I wasn’t allowed to share many details at all about the code that I was writing, so I had to obscure it quite a bit.

      And I wasn’t really advocating for saying what “doSomething” does in your documentation, but rather more for saying things like how ClassA interacts with ClassB in this example.

  • Anonymous Coward

    Idunno … if you only want to make ClassB accessible via ClassA – which is something which I think should have occured to you much earlier than writing the documentation, from the way you present the problem, I think it should have been a private inner class. Which would have avoided the problem.

    • http://madebyknight.com/ Eric Jeney

      But, that’s the thing.  ClassB could have been useful to others, so I didn’t really want to restrict it to only being used by ClassA.  I just needed to make sure that it was thread-safe as well.

  • Anonymous Coward

    Also, I think comments in code are justified in only two situations: when they document something non-obvious (like, if you insist to keep the code’s structure as  depicted above, that the synchronization and call ordering for ClassB is actually implemented in ClassA, although I think that’s bad design) and in libraries, where Javadoc comments are IMO extremely useful for client programmers. Other than that, I hate comments – they’re never checked by the compiler.

    • http://madebyknight.com/ Eric Jeney

      I agree.  I like to say that comments explain the “why” and not the “how.”  For example, it’s utterly useless to say “Increments cost by one” for the line: “cost++;”, but, depending on how obvious things are, it may be useful to say *why* you’re incrementing the cost by one.

  • http://profiles.google.com/mike.pope Mike Pope

    This is great. Allow me to add a thot from Raymond Chen [
    http://blogs.msdn.com/b/oldnewthing/archive/2007/02/23/1747713.aspx  as well that’s somewhat related to what you’re saying here:

    “There’s more to documentation than dry function descriptions, people! The function description is a reference; you go there when you already know what’s going on and you just need to fine-tune a detail. The real learning happens in the overviews and articles. If you want to learn how to operate your radio, you don’t read the schematic first.”

  • http://profile.yahoo.com/ADQLAQT527QDIMBMHSHTQEZQDE Jeremy WilliamM

    I would ask why you are recomputing  (array.length – 1) every time through the loop.

    • http://madebyknight.com/ Eric Jeney

      You’re right.  That’s particularly silly on my part.

  • Anonymous Coward

    FFS I despair at the modern attitude to programming.  Are you saying you document AFTER you’ve written the code?  You’ve got it the wrong way around.

  • FooBarBug

    As presented, doA and getB will both bomb if array.length is zero… array[-1] = num will die, as will return total/0.

    • http://madebyknight.com/ Eric Jeney

      Other parts of the code had already guaranteed that the array would never be size 0.  It’s typically around 3 or 4.

      • Nige

        If you’re writing code that will be used by any number of people over any length of time you can’t make these assumptions. Range checking must be integral to the function to prevent errors further down the line.

  • Cellar

    I’ve similarly caught bugs in my code because I liked its structure so much I read it again. This isn’t unique to documentation writing, but that could be a useful vehicle for giving it another once-over. There’s also the specification angle (_The Mythical Man-Month_), or the more recent write-tests-before-writing-code movement. In the latter case you have automated tools to help you find the bugs insofar as your unit tests correctly test for conformance to the spec.

    As a sidenote, too many programmers, even those in big FOSS projects, think that using something like doxygen is all you need; the result is lists of api calls with terse notes restating what a well-written interface will already have made clear. The difference is that now you get essentially the same (lack-of-)information automatically generated into five different formats, all useless. Writing documentation ought to be done with a clear goal in mind, such as explaining what the code does, or explaining how to use it, or preferrably both. And that includes playing “what if” games, or at least clearly stating the surrounding assumptions. To do that well you do need to explain the high-level structure and not just the nitty-gritty. Go ahead and take a look around you. How much documentation actually does any of that? How much does not? Have you asked a fellow programmer to try and use your interface with nothing but the documentation and tell you what he had to guess to make it work? And so on.

  • Samuel Carlsson

    All documentation I’ve ever read is in *best case* outdated but usually plain missleading. Code will always change faster than documentation can keep up. Spend the time making the code document itself is more valuable. Anyway, there will always be a more efficent company not wasting time on documentation that does not help, so the problem will sort itself out in the end.

  • Darren

    Nice article. Glad you are passionate about quality, bug free code. If you aren’t familiar with it already, take some time to read up on Design By Contract. In our implementation, comments are not only required, they are part of the formal interface specification for classes and methods.

    You’ll find applying the rigor of DBC will significantly improve the quality of your code.

  • Some guy

    I have long advocated that every significant piece of code should have an understudy as well as an author, and that the understudy should typically write the “POPS” (Principles of Operation) document for it – a developer-focussed and rather free-form document.  Writing the POPS brings him the understudy to the level of understanding necessary to fulfill his role  (e.g. support the code when the author is on vacation), as well providing invaluable documentation for test and for future code changes.
    Specification documents are notoriously poor for supporting code in this way.