Coming up in 2009

As another year comes to an end it’s time to get ready for the next year. I had a great year with lots of great new experiences, but I’m looking forward to having an even better year ahead. I have some great tech-related projects planned for the year ahead that I can’t wait to get started on. They’re in no particular order, since I consider them all equally important. Here goes:

1. Moving from Subversion to Git

Last year I started keeping my files under version control in a Subversion repository on an old Mac I used as a server. Though it was a good way to keep things in sync between my desktop, laptop and school machines, it hasn’t made my workflow as easy as I hoped it would. Due to a number of reasons, I’ll be moving my files to Git very soon. From what I’ve read on the internet, moving from Subversion to Git is a simple affair. This is the first major thing I’ll do after getting back to campus and a broadband internet connection.

2. Learning Lisp and C

I’ve been doing a lot of Python programming over the last year. I really like Python and I think it’s a great language that all developers should have a chance to use at some point. But I’m ready for a taste of something different. I’ll be using C and C++ a lot for my courses next semester and Lisp is a language I’ve been hoping for a while to explore in more detail. As Paul Graham says in ‘On Lisp’ C and Lisp are at two sides of the same coin: C models computers while Lisp models computation. I hope to have a worthwhile learning experience with these two different, but complementary languages. Along the way, I’d like to get more proficient in hacking Emacs.

3. More work on The Bytebaker

I made a decision to give more effort to blogging and writing in general when I moved this blog to its own domain. I’m going to keep up that effort in the new year. I’ll publishing more posts, hopefully of an increasing quality as well as putting up longer articles from time to time. In the latter half of the year I’m planning to move to independent hosting so that I can have more control and offer readers more features.

4. Research projects

My current research involving formal grammars is starting to gather steam and I can sense some really exciting work in the months to come. There are already a number of things that I’m interested in looking into and I would like to have publishable material by the time summer break rolls around. There is also a possibility I might have a chance to do work related to software engineering tools. Ever since I became interested in compilers, I’ve been thinking about the importance using proper tools and so this new work should prove to be enlightening.

5. Exploring Parallel Programming

Parallel programming is growing to become both one of the biggest challenges and opportunities of our time. My college has a small cluster running the Hadoop framework for parallel programming. I’m going to be doing some preliminary work to get used to the framework and the basic concepts and then I’ll be looking around for interesting projects to work on.

There are some other things that I’m interested in (Scala, low-level programming), but I’m going to try to focus on the ones I’ve mentioned above. At the same time, I’m open to change and if it seems that something else is worth pursuing, then I might have to reevaluate my priorities.

Please pay for your software

I’ve been a committed user of free and open source software (FOSS) for the better part of the last three years. In that time I haven’t paid for a single piece of software. In fact, I honestly can’t remember the last time that I actually have paid for software. I am sometimes amazed by the amount of fully functioning software that I get for the great price of free. However, at this time of year, when festive cheer is at its height, I think it’s about time we broke out our wallets and actually paid for the ‘free’ software that we do use.

Being the starving college student that I am, I to make it clear that I do not consider spending money on software lightly. I wouldn’t ever fork over the hundreds of dollars needed for Windows or Office or any of the Adobe products. At the same time I am a software developer and though I love creating software, I would like to be able to make a better than decent living from it someday. Being a computer science student, I also understand full well that writing good software is hard, it’s mentally draining and there’s only so much caffeine-powered coding you can do in a day before your productivity goes south. Like spending money on software, it’s a commitment not to be taken lightly. And you can’t conceivably invest so much effort into something so demanding unless you truly love what you’re doing. All the software I’m using now, the Linux kernel and the GNU utilities, the Emacs editor and the Firefox browser, collectively represent thousands, maybe even millions of man hours of programming time. All the programmers that have invested their time, effort (and probably money) into creating these great tools for everyone to use surely deserve compensation. At the very least, we have a commitment to help sustain the infrastructure that supports the FOSS movement. Server space and bandwidth aren’t exactly free and with the number of downloads continually growing, there will be increasing costs to pay to keep the movement going.

If you’re immediately put off by the thought of paying for software, don’t be, it’s not as painful as it sounds. The good news is that with free software you get to set the price. Even for a student like me, $50 or $100 a year is a very reasonable price to pay. If you’re a professional developer using free tools to write code for pay, consider a monthly contribution of $10 – $20 alternating between your commonly used FOSS projects. The bargain you’re getting is amazing. If a corporation actually charged for all the tools out there, we’d easily be having to fork over hundreds of dollars a years.

After reading two posts on Coding Horror, I’ve come to the conclusion that it is only logical to donate a reasonable amount to help support the free software we use. Jeff Atwood’s posts focus on paying for cheap, high quality software as opposed to pirating it. His reasoning is that if you don’t pay the bare minimum there soon won’t be anyone interested in making good software at low prices. The exact same logic holds for FOSS. Sure, there will always be people like Linus Torvalds releasing their private projects for others to use, and there will always be people like Richard Stallman driven by pure ideology. But in the end, unless people like you and me decide to make a contribution, hobby projects will stay hobby projects and not reach the levels of quality and reliability that we’ve come to expect. So the next time you’re out buying presents for your loved ones take a minute to think about all the people who have made your life as a programmer easier (and maybe helped you earn the money you’re now spending). And right now, when you finish reading this, pull out your credit card and donate some money to the open source project who’s software you value the most.

This year I’ll be donating $50 to Arch Linux. It’s the distro I’ve been using for the last two years and I appreciate both the technical quality and the lively community.

Books for intermediate computer science students

So you’ve survived the first few programming courses, you know why bubble sort’s a bad idea and you can write tree traversal algorithms in your sleep. The question that’s eating you as the new year looms is: what’s next? There are a proliferation of books out there meant for beginning programmers and quite a few for the experts, but in my experience there are fewer resources for people who are half-way up the ladder and want to know how to climb the next few rungs. Luckily there are a number of books out there that a sufficiently motivated student can use to learn more advanced techniques. 

Before I jump in, it’s worth taking some time to clarify what exactly intermediate is. For starters, I’m assuming that you know at least 3-4 different programming languages including at least one object-oriented, one functional and one utility language (eg Perl/Python/Ruby). I’m also assuming that you’ve worked in teams and have successfully accomplished at least one medium scale programming project which involved both program design and implementation. And since computer science is more than just programming, you should have at least a cursory knowledge of computing theory and algorithms, meaning that you know, at the least, what a Turing Machine is and what Big-O notation means. That being said, let’s get on with it:

Structure and Implementation of Computer Programs

Yes, I know MIT uses this book for its intro-level programming course, but it quite clearly states in one of the prefaces that most students taking that course already have some programming experience. It’s certainly a fine book, but I do think that novice programmers won’t quite understand the full power of some of the concepts presented. However, having spent some time in the trenches, you’re likely to have a better understand of what sort of things can make your life easier. This book will teach you a lot about abstractions and algorithms and once you’ve experienced the austere simplicity and inherent power of Scheme, you’ll never think the same way again.

On Lisp

Even though the book is supposed to about “Advanced Techniques for Common Lisp”, the first few chapters are devoted to a rather basic review of functional programming concepts. I’m currently working my way through this book and I think it’s fair game for anyone who has had exposure to functional programming in general and some dialect of Lisp in particular. Be warned however, that this book is quite topic specific. It makes no secret of the fact that many of the ideas that are explored will be useless, or at least very hard to implement without the powerful framework that Lisp gives you. Unless you plan on actually using Lisp for a substantial amount of your work in the future, this book might not be worth the time investment.

Design Patterns

At the other end of the generality spectrum is this classic. It’s authors have gained somewhat legendary status in the software engineering community as the Gang of Four and the book very well deserves its reputation as a must-read for serious software engineers. If you’re bulding production software and use any form of object orientation, you’ll soon be encountering some of the patterns described here. The books purpose is to describe general patterns that continually occur in software. It reads more like a catalog or cookbook than a standard textbook, but that adds to its appeal. Each pattern can be studied more or else separately and references to supporting patterns are made clear. There is also a considerable amount of sample code in Smalltalk and C++. This isn’t exactly a book that’s made to be read cover to cover and you’re likely to get bored unless you have a problem in mind that you’re looking to solve. The best way to get the most out of this book is to treat is as a reference and another tool in your toolkit.

Code Complete 2

Another general software engineering book that will make you a better programmer for the rest of your days. Code Complete is hard to describe in a few words and the best way to get a feel for what it is to read the first few chapters. Code Complete focusses on the actual writing of code. It deals with issues that arise was you and your team sit down to actually implement the design. If you have any experience at all working with a team on a large project, you’ll understand a lot of the issues that are dealt with here. This book isn’t about fancy optimizations or agile team management techniques, its about actually sitting down and writing your code. A must-read for anyone who wants to build better software (which I hope, is everyone who has ever written software).

Compilers: Principles, Techniques and Tools

You may only have a passing interest in compilers and may not be looking to become an expert in programming languages, but learning about compiler technology will help you in numerous other ways as well. A word of warning, this book is a classic but it also edges into the ‘advanced’ domain. I’ve started reading this book in earnest since I became interested in programming languages and I still have a long way to go. This book will force you to develop along multiple fronts: algorithms, data structure, even interface design (by way of designing usable languages). And as Steve Yegge pointed out, starting to write a compiler is not for the faint of heart, because there is no end to it (though that’s something for another post). Read this book only if you are seriously committed to someday becoming an expert programmer.

The Mythical Man Month

It’s a bit dated and a lot of the examples might seem archaic, but the underlying message is clear even after all these years. This book is must read if you’re ever in a position to lead a team ( and chances are, at some point you will be). If there is any book that I would say is a must for a software engineering curriculum, this is it. It can be tempting to skip over the more technical parts, but please don’t do so. Read this book cover to cover and then read it again a few weeks later.


The books I’ve listed above won’t make you an expert programmer, but they will start you on the way and take you a fair distance along. The truth is that it’s much harder to go from intermediate to expert than to go from beginner to intermediate (though I suppose that’s true of all disciplines). I’m looking for a good book on algorithms to add to the above list, but I haven’t come across one that is at the proper level yet. I consider Knuth’s masterpiece to be at a somewhat higher level than what I’m prepared for at the moment. Any suggestions would be welcome.

Going on an Internet diet

Living on a college campus for the past year has meant that I’ve become used to having an always-on high speed internet connection. It’s certainly very convenient to have the internet available at a moment’s notice. Furthermore campus-wide wireless means that I’m not limited to working at my desk. On the flip side though, it’s a challenge to make sure that I stay on the internet only as long as I need (or want to). The internet, for all it’s uses (and perhaps because of them) can be very addictive.

I’m leaving for India today and I’ll be away from an internet connection for at least a day, perhaps two. I do have a decent internet connection at home (256Kbps DSL), it doesn’t really compare to what I’m used to. And there’s certainly no wireless. Last time I was home, this situation was sometimes frustrating. However this time, I’ve decided to do a little experiment. I’ll be home for just over a month and I’m going to try to be keeping my internet usage to under an hour a day. ‘Internet usage’ in this case means email, browsing, reading my blogs and feeds and checking Facebook. I can’t feasibly cram in writing posts along with other activities into a hour time slot. But since I plan on writing regularly, I’ve decided that the best thing to do would be to write my posts offline and then copy/paste them into WordPress. Any time I spend looking up links will count towards the one hour quota though.

There are a number of things that I’m looking to accomplish with this diet. Firstly, the Internet is a great information source, but the signal to noise ratio can be dangerously low at times. I’m hoping that under time pressure I’ll learn to be more discerning about what I choose to pay attention to and what I don’t look at. I hope that I’ll be able to use the extra time to do something productive. I haven’t read as much over the last semester as I should have and I think that this experiment will be a good way to catch up and maybe make a new habit. At the same time, I’m going to be looking closely at whether or not having the internet affects other computer tasks that don’t need connectivity. In particular I’m looking to make substantial headway on my research project which will involve a lot of coding. I’m familiar enough with Python that I don’t need to constantly look up the reference docs, so it’ll be interesting to see if not being online all the time let’s me write code faster (or slower).

There are some details that I need to work around. For example, if I get into a conversation with a friend do I need to disconnect at the one hour mark or can I keep talking if I don’t have anything better to do? I’m also certain that I will ocassionally need to look up Python documentation and I’m not sure whether or not that should count towards my internet usage. I’ll be starting the diet from the start of next week after I’ve reached home and had a chance to recover from the trip. I should have the above questions resolved in a week and after that it’ll be up to me to actually enforce my diet. Whatever happens, I should have some interesting conclusions at the end of it.

Creating extensible programs is hard

As part of my research work I’m building a program that needs to have a pluggable visualizer component. I would like users to be able to create and use their own visualization components so that the main output from the program can be viewed in a variety of ways: simple images, 3D graphics, maybe even sounds. My program is in Python, but I would rather node limit my users to having to write Python visualization code. Ideally, they should be able to user any language or toolkit that they prefer. The easiest way to do this (as far as I can see) is to have the visualizers to be completely separate standalone programs. The original program would save its output as a simple text file and the visualizer would then be responsible for reading the file and performing its actions. However at the same time I would like to make it easy to write Python visualizers and that wouldn’t have to supply their own file reading operations. These goals means that I need to design and implement a stable framework that can handle all this extensibility.

I only really started working on this last night, but in that time I’ve realized that this is harder than it sounds. Here are some the things that I’ve figured that I have to do:

  1. Determine whether the visualizer is a Python module or a standalone program
  2. If it’s a standalone program, save the output as a file and and then call the program with the output file as a parameter
  3. If it’s a Python module, there should be a class (or a number of classes) correspond to visualizers.
  4. The visualizer objects should have a clean API that the main program should be able to use.

While none of these tasks are impossible or require advanced computer science knowledge, they do require a considerable amount of care and planning. Firstly, the mechanism to detect whether the visualizer is a Python module should be robust and accept user input, which means that there has to be error checking and recovery. There also needs to be a stable interface between the main program and modules that are loaded.  There should be clear communication between the parts but also the modules should not be able to interfere with the main program.

Allowing third party (or even second party) code to run inside your framework is not something to be considered lightly. Malicious, or even sloppily written code can have very dangerous effects on your own code. In my case, I’ll be directly translating my programs output to API calls on the Python visualizer objects. Calling non-existent methods would throw exceptions and at the very least I need to make sure these exceptions are caught. I also need to make decisions regarding just how much information the visualizers should get. Luckily for me, I won’t be allowing the modules to be sending back any information, so that makes my job easier.

Writing an extensible program like the one I am now is an interesting experience. I’ve been interested in software engineering for a quite a while now and though I’ve written large programs before, this is the first time that I’ve made one specifically geared towards extensibility. Extensibility brings to the forefront a number of issues that other types of development can sweep under the carpet. Modularity, security, having a clean API, interface design, everything is a necessity for making a properly extensible system. Furthermore, having an extensible system means that you are never quite sure what is going to happen or how your software will be used. This being the first time that I’m making such a system, I’m going to be very careful. I’m putting more time into the design phase because I don’t want to do a rewrite partway through the project. Let’s hope that this experience proves to be a good one.