Cloud Computing: coming full circle

Google Operating System has a recent article about a presentation made by Google China’s President Dr.Kai-Fu Lee on cloud computing. Cloud computing is what is gradually making Google a powerful force on the planet. The idea is to reduce a user’s dependence on any one computer or device. All data and applications are stored “in the Cloud”, so to speak. Everything is backed on multiple servers and is easily accessible from an Internet connected device. This means that you can use your data and the applications to manipulate that data from almost anywhere in the world (anywhere that has a decently fast internet connection at least). It’s an extremely powerful idea, which has numerous benefits both for the users and for whoever supplies your software.

As the user, the most obvious benefit is that you can access (your often very important) data from a lot of various places. Also important is that you can take advantage of the massive amount of powerful hardware and computing resources that are available in the cloud. You no longer need a 3Ghz processor with 4GB of RAM just to edit that letter to your boss, you just need enough power to run a modern browser and handle a fast internet connection. The results of having so much power is of course obvious right now: any web-based search is much faster than any desktop search due to much more hardware and powerful algorithms to take advantage of that hardware (Google’s MapReduce comes to mind). For the company providing the cloud services the advantages are just as attractive. There aren’t a dozen different versions to support with all the related compatibility issues. Once bugs are tracked down and fixed, the patches can be rolled out without having to bother the user. Google has being using in this in the gradual upgrades to Gmail and Google Documents and has been working quite well. For every piece of software that the user sees there is a lot more that is carefully hidden away. When you’re running a cloud, you have control over the what the user sees as well as what is behind the scenes. With some care you can make sure that both the software systems remain essentially separate but still work well together (unless you don’t control the cloud software, but that’s a different matter entirely).

Cloud computing seems to be the wave of the future, and with ubiquitous broadband, I think this will be a very efficient way of utilizing powerful computing resources. At the same time, cloud computing reminds me of the old days of time sharing computing (which I was born far too late to be a part of). Data and software was in a large mainframe (later minicomputers) either at a company’s computer headquarters or a college campus. To access them users could log from a remote terminal, sometimes dialing in over the phone lines. Of course in those days, users shared processor time and memory on just one computer and data transfer speeds were slow, even for the plain text that went around. But the user’s data and applications were still centrally located, accessible uniformly from any terminal connected to the system.

How different is this from today’s cloud computing? Well, very. Data speeds are higher. Communication is available not just for connected terminals but for any internet connected device, and there is far more computing power available for users. But the core concepts are nevertheless, essentially the same: data located in one place, but uniformly accessible from multiple separate locations. In many ways it’s like the saber-tooth: a design that has appeared many times throughout evolution exactly because it is a good design. Cloud computing too is a good design, which is going to stay with us for a long time to come.

Design for unit-testing

I’ve written before about the role of testing in programming and as I’ve written more code (and unit tests) over the past few weeks, my conviction that unit-testing is useful for more than just determining program correctness has become even stronger. In my previous post I spent about a paragraph exploring the other benefits of testing, I think it’s time that I offered a more detailed view of the alternate advantages of unit-testing.

As I started working on my last project for the semester yesterday, one of the things at the top of my mind was that the professor would be running his own unit-test on our programs. Our other projects had clear descriptions of what methods we were to create and how they should behave. But this project, a simple word counter written in Java, was different in that we were only told how the program as a whole should respond. We could shape the internals as we wished. Though it was very tempting to create a monolithic system (especially since the problem was rather simple), I decided that I should design with automated unit-testing in mind. Rather than making design complicated, this choice actually made things clearer.

It was obvious that the UI would have to be completely separate from the actual work portion of the code. Not only that, but that values returned from the methods doing the work would have to be in a form so that they could easily be compared to expected results. This meant that while the UI would deal with output formatting, there should not be very much formatting required in the first place. This guided me in the choice of the data structures that would eventually be returned.

Another area in which designing with testing in mind is useful is in determining if methods should be private. JUnit 3.8 does not support direct of testing of private methods. There is a certain amount of debate regarding whether or not this is a good thing, but this restriction does force a design methodology that can be beneficial. The only methods that should be private are the ones that need not be tested separately, that is, if the calling method passes testing, it automatically means that the private method is also working correctly. Though I didn’t need any private methods in this particular project, I did before and keeping the limitations of private testing in mind resulted in what I believe to be cleaner, more readable code.

Of course, it is easy to go overboard with unit testing. Your program shouldn’t be endlessly subdivided into lots of tiny functions just so that you tests every tiny chunk of code. Unit testing can help you design cleaner simpler program, but only if your design is pragmatic to start with. Like I’ve said before, no amount of coding tricks and development methodologies will fix a fundamentally broken design.

Jeff Atwood of Coding Horror said some time ago that unit-tests should be first class language constructs. At the time I thought that this was a bit overboard, but I’m coming to realize that this might be a good thing. Any language which has a decent error-handling mechanism would be able to bake in support for unit testing and having a unit-test mechanism directly in the language would probably encourage students to use it (and more importantly teachers to teach it). In my CS course we’ve using Unit-tests from early on, which I feel was very good decision on the part of the professor (even though many of my fellow students don’t quite seem to grasp it’s full importance right now).

Designing for unit testing encapsulates a lot of the ideas that are a part of good software design: UI separation, abstraction, code reuse and readability. Unit-testing is also a perfect example of abstraction: just worry about the big picture and the details will take care of themselves. So the next time you find yourself dealing with a big project, design for unit-testing and chances are you’ll be making better code than if you weren’t. Keep in mind though that no amount of unit-testing will replace actual user-testing…so make sure you get around to that at some point as well (hopefully as soon as you have something a user can actually use).

Don’t implement a bad algorithm

That’s a lesson I learned yesterday, the hard way. We’re currently learning about tree-like data structures in my computer science class and one of our four semester projects was writing a balancing method for an AVL-tree. Though the concept is very simple and fairly efficient (the balancing is logarithmic in the size of the tree on average), it’s very easy to come up with algorithms that seem to do the right thing, but really don’t. And that’s true of a lot of algorithms that programmers come up with. If that were the only thing wrong it wouldn’t be too much of a problem: simply discard the algorithm and create a new one. Unfortunately, the tendency for most student programmers is to simply implement the first algorithm that comes to mind and then start bug-hunting. Though I try hard to avoid doing this, I let down my guard this time. The result as you can imagine was not pretty.

If the algorithm is in itself a good one, which will give the correct result if applied right, ironing out the implementation details is not particularly hard. But in this case, I did not care to check if the algorithm was right. When trying to balance an AVL tree, there are a number of cases which need to be considered and dealt with. There are not a lot of them, but each of them needs a slightly different procedure to be taken care of properly. If I had started by breaking up the problem into each of the sub-problems and then writing code to deal with each case, the problem would have been simple (I finally did that and it took me less than an hour to get everything working). But instead, I tried to come up with a generic algorithm which could be altered to fit each of the cases. Now the idea of building a generic abstraction which can act in a number of different ways based on input and other conditions is a very powerful idea and can greatly reduce the amount of redundant code in a large system (closures anyone?). However, it should be kept in mind that a number of times, its actually easier to implement a slightly cluttered redundant system than spend time and effort trying to create a conceptually more complex system to reduce structural complexity. I made the mistake of not keeping this mind. And my algorithm did not work.

Things would still have worked out if I realized at that point that it would be easier to just follow the book and code solutions for the cases separately (abstracting only the most obvious parts into generic solutions). But I didn’t realize that. I tried to hammer and mutilate my broken algorithm to react as I wanted to the different cases and I kept piling on more and more code, desperately hoping that somehow it would also start working. I wasted a good 8 hours trying to get my abysmal mess of a balancing method to work right and I ended in utter and abject failure. In many ways, those have been my darkest hours since I first started programming about 5 years ago.

But yes, I do have a happy ending. 6 hours of sleep and a physics class later, I finally decided that enough was enough and that I was going to start from scratch, this time following the book’s recommendations and keeping excess complexity to a bare minimum. I was done in under an hour. So what’s the moral of this story? Spend more time thinking than you do coding and don’t implement an algorithm until you are fairly sure that it works the way you want it to to.

Sure, some bugs won’t be revealed until you actually have some code to compile and test, but there’s no way that you are going to code your way out of a broken algorithm. It’s very alluring to just sit down at a computer and start typing away. It makes you feel very productivity because there’s something happening on screen. In contrast, simply sitting down and working out an algorithm in your head often makes you feel that you are not really doing anything. It’s unfortunate that we call what we do “programming” when much of our time and effort is actually spent program planning (or should be). As the preface to Structure and Implementation of Computer Programs says, programs should be written primarily for people to read and understand and only incidentally for machines to execute. It follows from that if you can’t understand your own program and prove its correctness, you shouldn’t expect a computer to do the same. That’s a lesson I learned the hard way and I hope I never have to relearn it.

Typing troubles and keyboard contemplations

I’m a barely average typer. I haven’t really timed myself very well, but I know that I’m much slower than a lot of people that I know. What’s more worrying is that my typing is very error-prone, I have to hit the backspace key once very two or three words, certainly not very encouraging. I’ve been thinking about this quite a bit recently, and typing in general. Being a computer scientist and software developer in training, I’ll be doing a lot of typing over my lifetime and being a fast accurate typer is a must for me. All the more so, because I enjoy writing and blogging, but can’t really afford to spend a lot of time on them.

The more I think about it, the more I realize that there really a number of complicated issues here: not just very important things like RSI and ergonomics, but a lot of smaller things that can become very important. In fact, I wonder if there should seriously be college-level typing courses. These courses would not just teach ways to avoid RSI and ergonomics, but would also teach techniques like touch-typing which can significantly speed up typing speed. A part of the course would also be devoted to letting students experiment with different types of keyboards from different manufacturers and brands. I think that one size fits all certainly doesn’t work when it comes to keyboards. Though computer scientists might be the ones that most benefit from such a course, almost every college student could benefit from becoming a better typer — writing papers might take just a little less time if a student was pumping out 100+ words a minutes instead of 30-50. In the old days before typing became standard, people laid great emphasis on having clear legible handwriting. In an age when standard fonts have made legibility a non-issue, I think we should start placing just as much emphasis on being a good typer.

So back to my problem, how do i improve my speed and efficiency? One of the reasons that my typing is slow and error prone might be that I switch between a number of different keyboards: my 15.4″ Toshiba laptop has a fine flat keyboard with nice large keys closely spaced and I really enjoy typing on it. My old G4 Mac uses one of the old white Mac keyboards, those are nice, but I’m not very fond of it. My college library recently updated one of their Mac labs to the new aluminum iMacs and their thin flat keyboards. I have mixed feelings about this one, I like the laptop-like keys (though sometimes I do wish they were a bit stiffer), but I’m not quite sure about the spacing. Occasionally I find myself having to use various Dell keyboards and I like each of them to a different extent. I’ve been wondering if using a single keyboard might help improve my typing. Hardcore gamers are known for carrying their keyboards and mice to LAN parties, why shouldn’t programmers do the same? Ok, so there really aren’t that many coding parties, but you never know when you might have to sit down and write some code to save the world.

On a more serious note, I’m becoming increasingly convinced that finding the right keyboard is an absolute necessity for anyone who does a fair amount of typing. Your muscles have a certain memory and the better you train that memory, the better your typing will be. Of course if you keep switching keyboards, your muscles will have a hard time keeping up and won’t be remembering much. I’ve written before about how a good keyboard is important, but I haven’t really been looking for a full time keyboard myself. But now that I am putting more and more time into typing related activities, I realize that it is important that I find a good keyboard for myself. So what is important to me? Firstly, much of my programming takes place on my laptop under Arch Linux. I actually do like my laptop keyboard a lot and I’m not sure if I want to use an external keyboard with that. Portability is important for me as well. I like working from different places around campus: different rooms in the library and different computer labs on campus. I would like to be able to carry my keyboard with me. I also have developed a dislike for the ‘standard’ desktop keyboards, with thick heavy keys (once again probably due to heavy laptop use). Though I don’t really dislike them, they are not something I would like to buy and use all the time. It would be ideal if I could get a laptop-like keyboard, slightly larger, but light enough to carry around without much trouble.

There are a number of laptop-like desktop keyboards out on the market. The most popular seems to be the Kensington slim keyboard. It looks like a decent product, stylish and fully functional. There are also a number of foldable keyboards out there which I found rather interesting. There are some that are completely flexible and can be rolled up nice and tight. But I really don’t want something like that, because I think it’s too far from what I’m used to for me to feel comfortable. I looked at some types that are rigid, but divided and hinged so that they can be folded up. But they are mostly designed for PDAs and hence come with short USB cables and I didn’t really feel any that made me feel that I really wanted it. One keyboard that I was really interested in was the Matias folding keyboard. Unfortunately it seems to be out of supply and at almost $70 the price tag is a bit hefty. Considering that I’m not really all the go all the time, I don’t think the investment will be worth it.

So what can I get? I think I might have found a solution in the new Mac keyboards. They do take some getting used to, but the more time I spend it with one (currently almost an hour each day) I find myself getting better with them. The keyboards are also light, but sturdy, and the two USB ports on the side come in handy for plugging mice and USB drives in. The $50 price is a bit higher than I would have liked, but I think it’s acceptable. I’m not ready to commit yet, I’m going to spend another week or two trying them out before actually buying one, but I think it is the best option for me at the moment.

The Hardware Software Interface

One of my Computer Science professors recently lent me the book Computer Organization and Design: The Hardware/Software Design Interface written by two pioneers in the field of computer hardware: David Patterson and John Hennessy. This book is an excellent book about how the computers machinery is actually designed and built written by the people who introduced to the world RISC and MIPS. The book is widely used in undergraduate courses on computer and quite rightly so. I’ve only made my way through the first two chapters, but I already feel that this book deserves a place alongside the classics of computer science.

The reason I’m reading the book is because I’m interested in the hardware aspect of computers just as much as the software side. With so many powerful high-level languages that keep the programmer many layers of abstraction away from the hardware, it’s easy to forget that the computer will only reveal its full power if you learn how it works and learn it well. While it is certainly important to have a strong understanding of how the mathematics of computing work, it is equally important to know how to take those mathematics and concepts (some of which are incredibly powerful) and translate them to patterns of ones and zero. MIT’s Structure and Implementation of Computer Programs is probably the best book ever to be written for the purpose of teaching the fundamentals of pure computer science, covering everything from simple abstractions to machine-code generation. It’s not really a book to read unless what you want is a deep understanding of how computers compute. Combine that with a book like Computer Organization and Design (perhaps its graduate level partner) and you have a combination that if well utilized gives you a very complete understanding of computer systems.

Bridging the Hardware Software Interface is a very special piece of software : the compiler. The compiler is what will take your high-level mathematically abstract program and translate it to the bare bytes and the computer with deal with. You can’t implement a half-decent compiler without understanding well both the computer’s hardware architecture as well as the range of abstractions that you want to implement on that architecture. Steve Yegge correctly argues that you haven’t fully understood computers unless you have understood compilers. Compilers tie together all the fundamental concepts of computer science: programming languages, algorithms, data structures and of course, the hardware software interface.¬†Unfortunately compilers are often glossed over during a computer science education. There has been this break in computer technology education with computer science gradually moving towards a focus on pure software with the bare minimum of hardware and electrical engineering dealing with hardware with a minimum of software (mostly assembly and C). Personally, I think this is a grave mistake, one that leads to a gradual lowering in the quality of computer scientists.

What is the solution? The ideal would be a overhaul of the computer science curriculum, perhaps merging many aspects of computer science and electrical engineering, bridging the hardware-software interface by a proper emphasis on both. However this would also mean that students would have to study a lot more, the curriculum would become quite a good deal harder and there would have to be a move away from having so-called “industry standard” languages such as Java as the main teaching medium. Ideally all computer science courses would have both Computer Organization and Design and SICP as textbooks at some point during the course of the degree. As you can well imagine, this isn’t going to happen any time soon, probably never. I’m going to be studying both of them because I love computers and would never pass up a chance to learn more about them, but I am acutely aware that most of my classmates do not share my enthusiasm.

But what can I do, as a lone college freshman to gain a complete understanding of computer systems, the power they have to offer, the challenges they involve and the many interesting facets of the hardware-software interface? Luckily, I don’t have to bound by the college curriculum (and neither does anyone else). Both of the books I have talked about are written with students in mind (albeit dedicated and determined students) and should be easily available. In fact SICP can be downloaded as PDF or read online for free. The software tools that both of these books use are also freely available online. There are also a number of videos available online of courses conducted by the authors of SICP and I feel that they are an excellent companion to the book. And if that isn’t enough, there is always the World Wide Web, with a plethora of information sites and freely available tools as well many projects to which to contribute to put one’s knowledge to work. Learning should be a proactive activity with just as much enthusiasm shown by the student as the teacher. As the Zen saying goes: “When the student is ready, the teacher shall appear”. This could not be more true in the information age.

Happy Learning !!