The Role of Testing in Programming

    It’s often said (and only half-jokingly) that if we built bridges and buildings the way we build software, we’d be living in the stone age. Sure, a lot of software may be buggy, ugly or even downright useless, but that doesn’t mean that something can’t be done to fix it. One of the ways in which a developer can produce more reliable code is by rigorous testing. However for many developers, testing goes only as far as making sure that the program actually runs and gives reasonable output with a small number of typical inputs. However, all that proves is that your program isn’t falling part at the seams. It doesn’t prove that your program will work with the majority of use cases, it doesn’t prove that the program will work in unusual situations and it certainly does not prove a complete absence of bugs.

So how do you prove that your program will work correctly at least in the majority of cases? It needs to be tested hard, rigorously, brutally, unfairly. This doesn’t mean that you should test it beyond the limits of what is feasible or reasonable. Your text-editing program doesn’t have to be able to be able to open your email or edit half the HTML files on the web at the same time (unless you’re writing an Emacs super-clone), but it does have to handle multiple large files open at the same time, color syntax properly and not choke on large copy/paste operations. Deciding what is necessary to test and what can be deemed as outside your programs operating parameters can be a bit tricky at times, but it leads into what may be the more important role of testing in programming: testing forces you to think about your program design.

Fans of unit-testing often stress that unit-tests should be written first, before any of the actual program is written. This is not a easy concept for most programmers to grasp, and even fewer actually put it into practice. How are you supposed to test something when it doesn’t even exist yet? The key idea is that you’re not so much testing your code, but rather what your code should do. To some extent, it involves taking on an outsider’s perspective: a user doesn’t care how the program is written, only if it works. But if you’re writing tests for your own code, you also start thinking about how your program is structured. The idea of unit-testing is that you should test the smallest reasonable components before stepping up to larger portions of code. So you start thinking about what are the smallest parts of your code which need independent testing. Which parts can be safely folded into others and which ones have to be broken down? It also gives an idea of the interactions betweens the parts: what sort of data needs to be pushed around, how is that data affected and is there a chance that the data will be corrupted in all the moving about.

Of course, there is a down-side to this formal, test-first methodology: if you have a rapidly evolving project which has frequent design changes and restructuring, you’re going end up with a bunch of failed tests just as frequently, some of which tests things that don’t exist any more. If this happens too often, you might find yourself spending as much time updating your tests as you are updating your code. That’s never a good thing (of course, completely restructuring your project every week is probably not good either).

Even if a test-first methodology isn’t suitable for you (I don’t do it myself) you should employ formal unit-tests as much as possible. These tests should be automated i.e. they should automatically generate inputs for your programs and check the outputs without human intervention. This might seem painfully obvious, but I’ve seen more than a few students write tests which are essentially manual verifications wrapped in code. This is doubly wasteful as not only do you not test anything which you couldn’t test easily yourself, you’ve wasted time writing code for it.

That brings us to the question of manual testing. Unit testing is certainly very handy and will probably make your code much more reliable than no testing at all, but it won’t find all the bugs and it’s almost impossible to come up with all possible test cases. The real trial of your software comes not from endless automated testing, but from actual users using it out in the field. Believe me, there is no substitute for using your programs in the real world. And that doesn’t mean just using the program yourself. As a developer, it’s very likely that you have your code’s limitations tucked away in your subconscious, causing yourself to be blind to problems which would be obvious to people who actually use it without knowing about how it’s built. Getting real people to use your program in the real world is the best possible test even though it may be more time-consuming and less clear-cut that automated unit-testing. And there are some things (like UI effectiveness) which simply can’t be auto-tested.

Ultimately the role of testing boils down to the following major points. They are not universal and they won’t cover all possible cases, but they might go a long way to making your code more robust.

  1. Test often and rigorously.
  2. Use a combination of automated testing and user testing.
  3. Designing and testing go together: design for effective testing, test to find flaws in the design.
  4. Don’t just fix the bugs your tests reveal: think about how they can affect the rest of the program and how your program will be affected by the actual fixes themselves.
  5. Remember that testing will probably not eliminate all possible bugs, but that doesn’t mean that you should not test, or that you should test forever.

Pair Programming: Pros and Cons

    My computer science course at college has had us pair programming for labs in the first half of the semester. Pair programming is a technique in switch two programmers work on the same computer at the same time: however only one of them does the actual coding while the other checks each line of code as it is written. Every once in a while the two switch places and keep on coding. Over the past two months I’ve come to notice first hand some pros and cons, both from my own little team and the others in the class:

Pros:

  1. Two points of view: When you’re stuck in particularly nasty patch of code, trying hard to work your way through, having a fresh point of view can come in very handy. However since both the programmers are side-by-side there has to be some care taken so that both aren’t always thinking the same thing.
  2. Silly mistakes are quickly caught: Simple mistakes such as syntax errors or repeated variable names can be easily caught and fixed right then and there. This might not seem like much, but it can cut down debug time later and prevent small irritating bugs.
  3.  Better concentration: This isn’t particularly scientific, but it seems to me that too people working have a lesser tendency to procrastinate or get distracted. This results in shorter development times too.
  4. Combining knowledge: Computer science is a vast field and my course is rather fast-paced. It’s hard for any single student to have a comprehensive knowledge of what’s been going on in class, so two of us working together means that we can pool our knowledge and work together. This has the added of benefited of learning ‘on the job’ and not having to dive into our textbook or go to the instructor every time we have a doubt.
  5. It’s a good training ground for large software projects: Very little software is written by one person nowadays, teams ranging from small to big are not just the standard, they are a necessity. Pair programming teaches you a lot of the soft skills you’ll need: tolerance, respect, understanding and responsibility.

Cons:

  1. Skill disparity: This is the number one potential problem. If the partners are of completely different skill levels, you might have one programmer doing all the work or constantly tutoring the other. This is ok if you want to set up a teacher-student relationship or are introducing a new programmer to your system, but if not, it can defeat the entire purpose of pair-programming.
  2. Not actually getting the work done: Though I’ve personally experienced increased concentration when pair programming, for some people pair programming sessions can easily generate in socializing sessions. There are some people who don’t work when there is someone next to them examining their work, these people will not benefit much either.
  3.  Developer egos: This is something that is not likely to happen in a classroom setting, but in more experienced teams, each programmer might try to push their own ideas of how things should be done (both of which may be perfectly valid). These sort of conflicts can be downright disastrous.

So is it worth it? Some research suggests that it is, but there are enough variables to make the answer far from clear cut. If you can get a good partner who you’re comfortable with and can share similar work patterns, it might be very productive. On the other hand, avoid bad partners at all costs. However teamwork is a essential skill to learn, though you must be a competent programmer (or willing to learn) yourself before jumping onto a team. I think that you should try out pair programming every now and then if you can find appropriate partners. Using it in a classroom lab setting is also a good idea just as long as there are enough individual projects to make sure that one person isn’t doing all the work. But no matter what, it’s important to remember that pair programming is just another tool and will not compensate deficiencies such as incompetent programmers or poor working conditions and tools.

A time for planning and a time for coding

    Over Spring break I’ve been reading the older archives of own of my favorite blogs: Coding Horror. Though a lot of the posts are worth reading for anyone involved in computer programming, one that I feel is especially worth mentioning is Development is Inherently Wicked. The article talks about how it is almost impossible to fully plan out a complete software project and then have that plan hold throughout the process of creating the software. Though I’ve personally never been involved in a large scale software project, I’ve come to realize how true this is even with small projects.

For the past week I’ve been busily writing a Python package to allow interaction with a Hemisson robot. My initial planning was very simple: I would have only a very simple function-based interface.  The actual hardware interaction would be separate from the part the user used and there would be a simple configuration file which the user could edit. Was this enough planning with which to start writing code? I was afraid it might not be, but in the end it turned out to be a good thing. I managed to write the hardware interface in just a few hours and in just about 80 lines of code. It would have taken far less time, but there was some ambiguity in the Hemisson documentation that made me fiddle around directly with the robot to get things right.

It was when I started to build the configuration system that I had to deal with the other side of the coin. It was very tempting to jump in and quickly come up with some sort of config file syntax and then write up a parsing routine for it.  It would have worked and it probably would not have been much work. But I decided to fight the temptation to hack away and instead read up on Python tools for doing this sort of thing. After reading about online and posting to the Python-tutor mail list I came upon a simple, more elegant solution: the config file would be a Python file itself and I would make use of Python’s intrinsic ability to import modules stored in external files to access it and read. A little bit of planning thus eliminated the need for a rather large amount of complexity and code to handle that complexity. I’ve just finished another round of coding: a part of the actual module that the user will be using. I’ve managed to create simple functions for moving the robot and for accessing its sensors. However since the robot does not include any hardware to keep track of its position, writing more complex functions such as turning through a particular angle will be significantly more challenging. Once again, I need some planning and research.

The lesson I’ve learned is that doing a good job of making software involves a fairly fine balance between making careful plans and just sitting down and pumping out some code. So how can you know which one is more important? The answer is two-fold. Firstly, its a mistake to think that planning and coding are separate actions. They are part of continuous loop and a healthy software project must treat them as such. There has to be a decent level of feedback from the code to the plans and the plans have to make sure that bad code isn’t written just because it is easy and the first thing that comes to mind. Even while I was writing my hardware control routines I had to think about how I would deal with different hardware configurations and how to let the program parse and adapt to the configuration information. Like a lot of computer science, this is a good example of choosing the middle path: realizing that either extreme can lead to a badly implemented or even incomplete project and that staying close to the middle is the best way to go.

If you’re interested in learning more about this approach to programming projects, read though the entire article and Coding Horror and the book that it references.

Enter the ByteBaker

I’ve been blogging on and off since 2005, more off than on. As I’ve mentioned before, my spotty record is mostly due to the fact that I’ve never really had anything very compelling to put out on the Internet. But as I continue exploring computer technology and continue my formal studies in the field, I realize that I’m getting exposed to lots of different and interesting ideas and coming up with many of mine. Most of them are probably not very original, though perhaps people still find them interesting. But irrespective of that, I know that a lot of these thoughts and experiences are things that I would like to keep on record and would like to share with other people out there.

With that in mind, I’m going to make a commitment to recording my thoughts and ideas over the next few years as well as keeping a record of my projects and experiments. I’ve registered a domain name and this blog now routes to it. I’m still running off WordPress.com and will be for some time, but I’m going to be posting everything under my new website : The ByteBaker.

Why call it The ByteBaker? Many people have differing opinions on what computer science really is. According to MIT professor Gerald Jay Sussman, computer science is not a science and is really not about computers. Over the last few years I’ve come to agree with him. Computer science in general and programming in particular seems to be to be quite similar to cooking: If you start mixing ingredients at random, you might come up with something that is edible, but it probably won’t taste very good. To make a good dish you need an understanding of your ingredients and utensils as well as a fair amount of improvisation and inspiration. Computer science is similar that you need to understand the concepts and tools that are a part of your trade and to actually produce something novel and interesting you need a healthy dose of imagination and daring. While cooks and bakers use various raw ingredients such fruits, vegetables, spices, meats, etc., for computer scientists all our creations can be described in terms of bits and bytes: hence enter the ByteBaker (also alliteration is kinda cool).

So what does all this mean for you? Probably not a lot at the moment. You might want to upgrade your bookmarks to point to The ByteBaker instead of Xtreme Computers, because in a few months I will be moving to a different hosting solution so that I can have more control over the WordPress installation. If you’re a feed subscribers, you don’t have to do anything at all: I’ll still be publishing at the old feed URL. So just sit back and enjoy the ride!

OpenEmbedded bug found and squashed

I’ll be using Python extensively for my robotics project with Gumstix. The goal is to come up with a set of modules that let the user create high-level behavior programs for the robots without worrying about how to control the hardware. Of course none of this is going to happen if I can’t actually get the Python interpreter to work on the Gumstix.

The basic Linux installation on the Gumstix doesn’t come with Python, so I had to put in on there manually. The Python packages for Gumstix are very modular, which lets installation of only the needed packages without anything extra. Instead of just installing the required packages on the Gumstix I decided to create a fresh root filesystem image, since I would be installing it on multiple Gumstix. What I really needed was the Pyserial package which provides a nice Python interface to serial ports. That was the only package that I added to the buildscript (more on that later), hoping that it would pull in all the required dependencies. After I finished the build and reflashed the Gumstix using the wiki instructions, I booted up and started the Python interpreter. But to my horror, it wouldn’t load the Pyserial module. Apparently it needed an older implementation of various string functions which weren’t pulled in. So I repeated the process, by adding the older module to the build script. Even then the Pyserial module wouldn’t load because it couldn’t find the struct module. By this time I was starting to get frustrated, because the struct module should have been pulled in with just a basic Python implementation. The struct module handles conversions between Python values and C structs represented as Python strings. This module not being found meant that somehow the entire OpenEmbedded Python system was fundamentally broken.

After a bit of research with the help of Google, it turned out that there was an error with the python-2.5-manifest.inc file which contains a listing of all the files that make up the base Python system. This file had an incorrect reference to which files actually implemented the struct module. This problem had occurred in the OpenEmbedded system around November last year and had been fixed, but somehow the fix had not found its way into the Gumstix version of OpenEmbedded. So I’ve dutifully filed a bug report and submitted patch, and I’m hanging on to my corrected Python manifest while the patch is implemented (which will hopefully be soon).

This whole experience of tracking a bug in a system I hadn’t created and finding a fix for it was quite an interesting experience. On one hand it was quite frustrating because the problem wasn’t it something I had an intimate knowledge of. On the other hand it forced me to learn a lot about the OpenEmbedded build system and I also learned how to go about looking for documentation, all of which will come in handy as I keep working with Gumstix and the OpenEmbedded system. And last but not least, it led to my first ever patch submission for a large software project — not much of a contribution, but it certainly makes me feel good !!