Software is Forever Beta

Word on the web is that Google just pulled the Beta label off it’s Chrome browser. As Google Operating System has noted, it’s not that Chrome is fully ready for mass consumption but rather that it’s just good enough to enter the fray with Firefox and Internet Explorer and that Google is in the process of sealing bundling deals with some OEMs. There is still work to be done, there are still bugs and their  are important features in the works (including an extension system). But the event raises a question that I don’t think has ever been convincingly answered: when is the right time to take the Beta label off a piece of software?

Wikipedia says that a beta version of a software product is one that has been released to a limited number of users for testing and feedback, mostly to discover potential bugs. Though this definition is mostly accurate, it’s certainly not the complete definition. Take for example Gmail which is open to anyone and everyone, isn’t really out there for testing, but is still labeled beta years after it was first released. You could say that in some ways Gmail changed the software culture by being ‘forever beta’. On the other hand Apple and Microsoft regularly send beta versions of their new products to developers expressly for the purpose of testing.

Corporate branding aside, everyone probably agrees that a piece of software is ready for public release if it generally does what it claims to do and doesn’t have any show-stopping bugs. Unfortunately this definition isn’t as clear cut as it seems. It’s hard to cover all the use cases of a software product until it is out in the real world being actively used. After all, rigorous testing can only prove the presence of bugs, not their absence. It’s hard to tell what a showstopping bug is until the show is well under way. Also, going by the definition, should the early versions of Windows have been labeled beta versions because they crashed so much? With exam week I’ve seen college library iMacs choke and grind to a halt (spinning beachball of doom) as student after student piles on their resource intensive multimedia. Is it fair to call these systems beta because they crack under intense pressure?

Perhaps the truth is that any non-trivial piece of software is destined to be always in beta stage. The state of software engineering today means that it is practically impossible to guarantee that software is bug-free or will not crash fatally if pushed hard enough. As any software developer on a decent sized project knows, there’s always that one obscure that needs to be fixed, but fixing it potentially introduces a few more. However, that being said, the reality of life is that you still have to draw the line somewhere and actually release your software at some point. There’s no hard and fast rule that says when your software is ready for public use. It generally depends on a number of factors: what does your software do? Who is your target audience? How often will you release updates and how long will you support a major release? Obviously the cut-off point for a game for grade-schoolers is very different from that for air traffic control software. Often it’s a complicated mix of market and customer demands, the state of your project and the abilities of your developers.

But Gmail did more than bring about the concept of ‘forever beta’. It introduced the much more powerful idea that if you don’t actually ‘ship’ your software, but just run it off your own servers, the release schedule can be much less demanding and much more conducive to innovation. Contast Windows Vista with it’s delayed releases, cut features, hardware issues and general negative reaction after release, with Gmail and it’s slow but continuous rollout of new features. Looking at this situation shows  that Gmail can afford to be forever beta whereas Windows (or OS X for that matter) simply cannot. The centralized nature of online services means that Google doesn’t need to have a rigid schedule with all or nothing release dates. It’s perfectly alright to release a bare-bones product and then gradually add new features. Since Google automatically does all updates, that means that early adopters don’t have to worry about upgrading on their own later. People can jump on the bandwagon at any time and if it’s free, more people will do so earlier, in turn generating valuable feedback. It also means that features or services that are disliked can be cut off (Google Answers and Browser Sync). That in turn means that you don’t have to waste valuable developer time and effort in places where they won’t pay off.

In many ways the Internet has allowed developers to embrace the ‘forever beta’ nature of software instead of fighting it. However even if you’re not a web developer, you can still take measures to prevent being burned by the endless cycle of test-release-bugfix-test. It’s important to understand that your software will change in the future and might grow in unexpected directions. All developers can be saved a lot of hardship by taking this fact into account. Software should be made as modular as possible so that new features can be added or old ones taken out without need for drastic rewrites. Extensive testing before release can catch and stop a large number of possible bugs. Use dependency injection to make your software easier to test (more on that in a later post).  Most importantly however, listen to your users. Let your customers guide the development of your products and don’t be afraid to cut back on features if that is what will make your software better. After all, it isn’t important what you label your software, it matters what your users think of it. People will use Gmail even if it stays beta forever because it has already proved itself as a reliable, efficient and innovative product. Make sure your the same can be said of your software.

The Role of Testing in Programming

    It’s often said (and only half-jokingly) that if we built bridges and buildings the way we build software, we’d be living in the stone age. Sure, a lot of software may be buggy, ugly or even downright useless, but that doesn’t mean that something can’t be done to fix it. One of the ways in which a developer can produce more reliable code is by rigorous testing. However for many developers, testing goes only as far as making sure that the program actually runs and gives reasonable output with a small number of typical inputs. However, all that proves is that your program isn’t falling part at the seams. It doesn’t prove that your program will work with the majority of use cases, it doesn’t prove that the program will work in unusual situations and it certainly does not prove a complete absence of bugs.

So how do you prove that your program will work correctly at least in the majority of cases? It needs to be tested hard, rigorously, brutally, unfairly. This doesn’t mean that you should test it beyond the limits of what is feasible or reasonable. Your text-editing program doesn’t have to be able to be able to open your email or edit half the HTML files on the web at the same time (unless you’re writing an Emacs super-clone), but it does have to handle multiple large files open at the same time, color syntax properly and not choke on large copy/paste operations. Deciding what is necessary to test and what can be deemed as outside your programs operating parameters can be a bit tricky at times, but it leads into what may be the more important role of testing in programming: testing forces you to think about your program design.

Fans of unit-testing often stress that unit-tests should be written first, before any of the actual program is written. This is not a easy concept for most programmers to grasp, and even fewer actually put it into practice. How are you supposed to test something when it doesn’t even exist yet? The key idea is that you’re not so much testing your code, but rather what your code should do. To some extent, it involves taking on an outsider’s perspective: a user doesn’t care how the program is written, only if it works. But if you’re writing tests for your own code, you also start thinking about how your program is structured. The idea of unit-testing is that you should test the smallest reasonable components before stepping up to larger portions of code. So you start thinking about what are the smallest parts of your code which need independent testing. Which parts can be safely folded into others and which ones have to be broken down? It also gives an idea of the interactions betweens the parts: what sort of data needs to be pushed around, how is that data affected and is there a chance that the data will be corrupted in all the moving about.

Of course, there is a down-side to this formal, test-first methodology: if you have a rapidly evolving project which has frequent design changes and restructuring, you’re going end up with a bunch of failed tests just as frequently, some of which tests things that don’t exist any more. If this happens too often, you might find yourself spending as much time updating your tests as you are updating your code. That’s never a good thing (of course, completely restructuring your project every week is probably not good either).

Even if a test-first methodology isn’t suitable for you (I don’t do it myself) you should employ formal unit-tests as much as possible. These tests should be automated i.e. they should automatically generate inputs for your programs and check the outputs without human intervention. This might seem painfully obvious, but I’ve seen more than a few students write tests which are essentially manual verifications wrapped in code. This is doubly wasteful as not only do you not test anything which you couldn’t test easily yourself, you’ve wasted time writing code for it.

That brings us to the question of manual testing. Unit testing is certainly very handy and will probably make your code much more reliable than no testing at all, but it won’t find all the bugs and it’s almost impossible to come up with all possible test cases. The real trial of your software comes not from endless automated testing, but from actual users using it out in the field. Believe me, there is no substitute for using your programs in the real world. And that doesn’t mean just using the program yourself. As a developer, it’s very likely that you have your code’s limitations tucked away in your subconscious, causing yourself to be blind to problems which would be obvious to people who actually use it without knowing about how it’s built. Getting real people to use your program in the real world is the best possible test even though it may be more time-consuming and less clear-cut that automated unit-testing. And there are some things (like UI effectiveness) which simply can’t be auto-tested.

Ultimately the role of testing boils down to the following major points. They are not universal and they won’t cover all possible cases, but they might go a long way to making your code more robust.

  1. Test often and rigorously.
  2. Use a combination of automated testing and user testing.
  3. Designing and testing go together: design for effective testing, test to find flaws in the design.
  4. Don’t just fix the bugs your tests reveal: think about how they can affect the rest of the program and how your program will be affected by the actual fixes themselves.
  5. Remember that testing will probably not eliminate all possible bugs, but that doesn’t mean that you should not test, or that you should test forever.