Thinking about Documentation

My friend Tycho Garen recently wrote a post  about appreciating technical documentation. As he rightly points out technical documentation is very important and also very hard to get right. For someone who writes code I find myself in the uncomfortable position of having my documentation spread out in at least 2 places.

A large part of my documentation is right in my code in form of comments and docstrings. I call this programmer-facing documentation. It is documentation that will probably only be seen by other programmers (including myself). However, even though it might only be seen by programmers who are using (or changing) the code doesn’t mean that it should just be in the code. More often than not, it’s advisable to be able to have this documentation exported to some easier-to-read format (generally hyperlinked HTML or PDF). Of course I don’t want everyone who wants to use my software to go digging through the source code to figure out how things work. A user manual is generally a good idea for your software no matter how simple or complex it might be. At the very least there should be a webpage describing how to get up and running.

One of the major issues of documentation is that it’s either non-existent or hopelessly out of date. A large part of the solution is simply effort and discipline. Writing good comments and later writing a howto are habits that you can cultivate over time. That being said, I’d like to think that we can use technology to our benefit to make our job easier (and make writing and updating documentation easier).

Personally I would love to see programming languages grow better support for in-source comments. Documentation tools like Javadoc and Epydoc certainly help in generating documentation and give you a consistent, easy-to-understand format, but the language itself has no idea about what the comments say. They are essentially completely separate from the code even though they exist side by side in the same file. I would love it if languages could work together with the documentation, say by autogenerating parts of it, or doing analyses to detect inconsistencies.

As for documentation that lives outside of the code, I’m glad to see that there is a good deal of really good work being done in this area. Github recently updated their wiki system so that each wiki is essentially a git repo of human-readable text files that are automatically rendered to HTML. Github’s support for Git commit notes and their excellent (and recently revised) pull requests systems provide really good systems for maintaining a conversation around your code. The folks over at Github understand that code doesn’t exist by itself and often requires a support structure of both documentation and discussion surrounding it to produce a good product.

So what’s my personal take on the issues? As I’ve said before I’m starting work on my own programming language and I intend to make documentation an equal partner to the code. I plan on making use of Github Pages to host the documentation in readable from right next to my source code. At the same time, I’m going to giving some thought into making documentation a first class construct in the language. That means that the documentation you write is actually part of the code instead of being an inert block of text that needs to be processed externally. The Scribble documentation system built on top of Scheme has some really interesting ideas that I would love to look into and perhaps adapt. Documentation has always been recognized as an important companion to coding. I’m hoping that we’re getting to the stage where we actually pay attention to that nugget of common wisdom.

Advertisements

The Documentation Problem

Over the past year and a half I’ve come to realize that writing documentation for your programs is important. Not only is documentation helpful for your users, it forces you to think about and explain the workings of your code. Unfortunately, I’ve found the tools used to create documentation (especially user-oriented documentation) to be somewhat lacking. While we have powerful programmable IDEs and equally powerful version control and distribution systems, the corresponding tools for writing documentation aren’t quite at the same level.

Let me start by acknowledging that there are different types of documentation for different purposes. In-code documentation in the form of comments are geared toward people who will be using your code directly, either editing it or using the API that it exposes. In this area there are actually good tools available. Systems like Doxygen, Epydoc or Javadoc can take formatted comments in code and turn them into API references in the form of HTML or other formats. Having the API info right in the code, it’s easier to make sure that changes in one are reflected in the other.

User-oriented documentation has slightly different needs. As a programmer you want a system that is easy to learn and is fast to use. You also want to be able to publish it different formats. At the very least you want to be able to create HTML pages from a template. But you also want the actual source to be human-readable (that’s actually a side-effect of being easy to create) because that’s probably what you, as the creator, will be reading and editing the most.

Then there are documents that are created as part of the design and coding process. This is generally wiki territory. A lot of this is stuff that will be rewritten over and over as time progresses. At the same time, it’s possible that much of this will eventually find its way into the user docs. In this case, ease of use is paramount. You want to get your thoughts and ideas down as quickly as possible so that you can move on to the next thought. Version controlling is also good to have so that you can see the evolution of the project over time. You might also want some sort of export feature so that you can get a dump of the wiki when necessary.

Personally, I would like to see the user doc and development wikis running as two parts of the same documentation system. Unfortunately, I haven’t found tools that are quite suitable. I would like all the documentation to be part of the same repository where all my code is stored. However, this same documentation needs to be easily exported to decent looking web pages and PDFs and placed online with minimal effort on my part. The editing tools also need to be simple and quick with a minimal learning curve.

There are several free online wiki providers out there such as PBworks and WikiDot which allow the easy creation of good looking wikis. But I’m hesitant to use any of them since there isn’t an easy way to easily tie them into Git. Another solution is to make use of Github’s Pages features. Github lets you host your git repositories online so that others can get them easily and start hacking on them. The Pages features allows you to create simple text files with either the Textile or Markdown formatting systems and have them automatically turned into good looking HTML pages. This is a good idea on the whole and the system seems fairly straightforward to use, with some initial setup. The engine behind Pages, called Jekyll is also free to download and use on your own website (and doesn’t require a Git repository).

In addition to these ‘enterprise-quality’ solutions, there are also a number of smaller, more home-grown solutions (though it could be argued that Jekyll is very homegrown). There’s git-wiki, a simple wiki system written in Ruby using Git as the backend. Ikiwiki is a Git or Mercurial based wiki compiler, in that it takes in pages written in a wiki syntax and creates HTML pages. These are viable solutions if you like to have complete control of how your documentation is generated and stored.

Though each of these are great in and of themselves, I still can’t help feeling that there is something missing. In particular, there is lack of a common consensus of how documentation should be created and presented. Some projects have static websites, others have wikis, a few have downloadable PDFs. Equally importantly there isn’t even a moderately common system for creating this documentation. There are all the ways I’ve noted above, which seem to be the most popular. There are also more formal standards like DocBook. Finally lets not forget man and info pages. You can also create your own documentation purely by hand using HTML or LaTex. Contrast this to the way software distribution works (at least in open source): there are binary packages and source tarballs and in many cases some sort of bleeding-edge repository access. There are some exceptions and variations in detail, but by and large things are similar across the board.

Personally, I still can’t make up my mind as to how to manage my own documentation. I like the professional output that LaTex provides and DocBook seems like a well-thought-out standard, but I’d rather not deal with the formatting requirements, especially in documents that can change easily. I really like wikis for ease of use and anywhere editability, but I must be able to save things to my personal repository and I don’t want to host my own wiki server. I’ve previously just written documentation in plain text files and though this is good for just getting the information down, it’s not really something that can be shown to the outside world. For the time being, I’ll be sticking to my plain text files, but I’m seriously considering using Github Pages. For me this offers the dual benefit of easy creation in the form of text files as well having decent online output for whatever I decide to share. I lose the ability to edit from anywhere via the internet, but that’s a price I’m willing to pay. I can still use Google Docs as a emergency temporary staging area. I’m interested in learning how other developers organize their documentation and would gladly hear any advice. There’s a strong chance that my system will change in some way in the future, but that’s true of any system I might adopt.