Python as a glue language

I’ve spent the better part of the past few weeks redoing much of the architecture for my main research project. As part of a programming languages research group I also have frequent discussions on the relative merits of various languages. Personally I’ve always liked Python and used it extensively for both professional and personal projects. I’ve used Python for both standalone programs and for tying other programs together. My roommate likes Bash for his scripting needs but I think Python is a better glue language.

My scripting days started with Perl about 6 years ago but I quickly gave up Perl in favor of Python. I don’t entirely remember why, but I do remember getting the distinct impression that Python was much cleaner (and all-round nicer) than Perl. Python has a lot of things going for it as a scripting language – a large “batteries included” standard library, lots of handy string functions for data mangling and great interfaces to the underlying OS.

Python also has decent features for being a general purpose language. It has a good implementation of object-orientation and classes (though the bolts are showing), first class functions, an acceptable module system and a clean, if somewhat straitjacketed syntax. There’s a thriving ecosystem with a centralized repository and a wide variety of libraries. I wish there were optional static types and declarative data types, but I guess you can’t have everything in life.

Where Python shines is at the intersection of quick scripting and full-fledged applications. I’ve found that it’s delightfully easy to go from a bunch of scripts lashed together to a more cohesive application or framework.

As an example I moved my research infrastructure from a bunch of Python and shell scripts to a framework for interacting with virtual machines and virtual networks. We’re using a network virtualizer called Mininet which is built in Python. Mininet is a well engineered piece of research infrastructure with a clean and Pythonic interface as well as understandable internals. Previously I would start by writing a Python script to instantiate a Mininet virtual network. Then I would run a bunch of scripts by hand to start up virtual machines connected to said network. These scripts would use the standard Linux tools to configure virtual network devices and start Qemu virtual machines. There were three different scripts each of which took a bunch of different parameters. Getting a full setup going involved a good amount of jumping around terminals and typing commands in the right order. Not very tedious, but enough to get annoying after a while. And besides I wouldn’t be much of a programmer if I wasn’t automating as much as possible.

So I went about unifying all this scripting into a single Python framework. I subclassed some of the Mininet classes so that I could get rid of boilerplate involved in setting up a network. I wrapped the shell scripts in a thin layer of Python so I could run them programmatically. I could have replaced the shell scripts with Python equivalents directly but there was no pressing need to do that. Finally I used Python’s dictionaries to configure the VMs declaratively. While I would have liked algebraic data types and a strong type system, I hand-rolled a verifier without much difficulty. OCaml and Haskell have definitely spoiled me.

How is this beneficial? Besides just the pure automation we now have a programmable, object-oriented interface to our deployment setup. The whole shebang – networks, VMs and test programs – can be set up and run from a single Python script. Since I extended Mininet and followed its conventions anyone familiar with Mininet can get started using our setup quickly. Instead of having configurations floating around in different Python files and shell scripts it’s in one place making it easy to change and remain consistent. By layering over and isolating the command-line interface to Qemu we can potentially move to other virtualizers (like VirtualBox) without massive changes to setup scripts. There’s less boilerplate, fewer little details and fewer opaque incantations that need to be uttered in just the right order. All in all, it’s a much better engineered system.

Though these were all pretty significant changes it took me less than a week to get everything done. This includes walking a teammate through both the old and new versions and troubleshooting. Using Python made the transition easier because a lot of the script code and boilerplate could be tucked into the new classes and methods with a few modifications. Most of the time was spent in figuring out what the interfaces should look like and how they should be integrated.

In conclusion, Python is a great glue language. It’s easy to get up and running with quick scripts that tie together existing programs and do some data mangling. But when your needs grow beyond scripts you can build a well-structured program or library without rewriting from scratch. In particular you can reuse large parts of the script code and spend time on the design and organization of your new applcation. On a related note, this is also one of the reasons why Python is a great beginner’s language. It’s easy to start off with small scripts that do routine tasks or work with multimedia and then move on to writing full-fledged programs and learning proper computer science concepts.

As a disclaimer, I haven’t worked with Ruby or Perl enough to make a similar claim. If Rubyists or Perl hackers would like to share similar experiences I’d love to hear.

We’ll always have native code

A few days ago my friend Greg Earle pointed me to an article with the provocative title of “Hail the return of native code and the resurgence of C++“. While the article provides a decent survey of the state of the popular programming language landscape, it misses the point when it comes to why VM-based or interpreted languages have become popular (and will continue to be so).

The reason why languages running on managed runtimes are and should continue to be popular is not quite as simple as relief from manual memory management or “syntactic comforts” as the author puts it. While both are fair reasons in and of themselves (and personally I think syntax is more important than is generally considered) the benefits of managed runtimes are more far reaching.

As Steve Yegge put it, “virtual machines are obvious“. For starters most virtual machines these days come with just-in-time compilers that generate native machine code instead of running everything through an interpreter. When you have complete control over your language’s environment there are a lot of interesting and powerful things you can do, especially in regards to the JIT. No matter how much you optimize your statically generated machine code you have far more information about how your program is actually running at runtime. All that information allows you to make much smarter decisions about what kind of code to generate.

For example, trace-based virtual machines like Mozilla’s Spidermonkey look for “hotspots” in your code: sections that get run over and over again. It can then generate custom code tuned to be fast for those cases and run that instead of the original correct but naive code. Since JavaScript is a dynamically typed language and allows for very loose, dynamic object creation, naive code can be very inefficient. Thus the runtime profiling is all the more important and can lead to massive speed improvements.

If you’re running a purely native-compiled language you miss all the improvements that you could get if you reacted to what you saw at runtime. If you’re writing an application that is meant to run for long periods of time (say a webserver for a large, high-traffic service) it makes all the more sense to use a VM-based language. There is a lot of information that the VM can use to do optimizations and a lot of time for those optimizations to kick in and have a definite performance benefit. In robust virtual machines with years of development behind them you can get performance that is quite comparable to native code. That is, unless you’re pulling all sorts of dirty tricks to carefully profile and optimize your handwritten native code in which case all bets are off. But honestly, would rather spend your days hand-optimizing native code or writing code that actually does interesting things?

Personally, I hardly think a new C++ standard is going to usher in a widespread revival of native code. C++ has problems of it’s own, not the least of which is that it simply a very big language that very few people fully understand (and adding features and constructs doesn’t help that). A fallout of being a complex language without a runtime is that writing debugging or analysis tools for it is hard. In Coders at Work both Jamie Zawinski and Brendan Eich denounce C++ to some extent, but Eich adds that his particular grievance is the state of debugging tools that have been practically stagnant for the last 20 odd years. Being fast, “native” or having a large feature list is not nearly sufficient to be a good programming language.

All that being said, there is definitely a place for native code. But given the expressive power of modern languages and the performance of their runtimes I expect that niche to keep dwindling. Even Google’s new Go programming language, which is ostensibly for systems programming and generates native code has some distinctly high level features (automated memory management for one). But if you’re building embedded systems or writiing the lowest level of operating systems kernels you probably still want to reach for a fast, dangerous language like C. Even then I wouldn’t be surprised if we see a move towards low-level unmanaged kernels fused with managed runtimes for most code in some of these places. Microsoft’s Singularity project comes to mind.

We’ll always have native code (unless we restart making hardware Lisp machines, or Haskell machiness) but that doesn’t mean that we’re going to (or should) see a widespread revival in low-level programming. I want my languages to keep getting closer to my brain and if that means getting away from the machine, then so be it. I want powerful machinery seamlessly analyzing and transforming source code and generating fast native code. I want to build towers on top of towers. You could argue that there was a time when our industry needed a language like C++. But I would argue that time has passed. And now I shall get back to my Haskell and JavaScript.

Enforcing coding conventions

It’s Black Friday which means that in most of the United States people are out shopping taking the advantage of supposedly great deals. I’m not indulging because I try to only buy stuff when I absolutely need instead of getting something just because it’s cheap. However the one thing that I did buy was the Kindle edition of the Joel Spolsky edited “The Best Software Writing”. It’s a collection of talks and articles about software. I could probably have gotten all the material for free online, but I decided to pay the $9.99 to actually get the Kindle edition. It’s the first actual Kindle book that I bought and I spent a good hour today reading it.

The very first article in the collection is “Style is Substance” by Ken Arnold. It is a suggestion to do away with differing coding formats and conventions in programming languages and instead having a single style for the language that is written into the language’s grammar. Thus if you write a program that violates the format rules, it’s not just ugly or bad form — it’s actually an incorrect program that will not compile. The suggestion is probably not a very popular one. In particular, programmers tend to be rather defensive about these sort of personal preferences — coding style, editor, version control tool, all have been know to trigger religious wars (and still often do).

Personally, I can understand the lure of having a single, undisputed code format that is enforced by the compiler. No more spending time figuring out where one block ends and another begins. No more spending precious mental cycles figuring out someone else’s conventions. One of the reasons I like Python is that it’s such a clean, yet flexible language. But on the flip side, I’ve generally been a fan of letting programmers use whatever tools they like as long as they get the job done. From that perspective, coding convention is just another tool and people should be allowed to use whatever one makes them work better.

However, the argument that code formats are just another personal preference doesn’t really hold up. In particular, unlike editor color schemes or other such preferences, code is something that is meant to be shared with other people. One of the greatest lessons I’ve learned from SICP is that programs must be written primarily for people to read and only incidentally for computers to execute. Since code will be shared, it makes sense to take measures that ensure that it will easily readable and comprehensible by other people. Having different coding is less like everyone using a different editor and more like everyone using a different XML schema to exchange documents. OK, it’s not quite so extreme, but it can be close.

If you’re someone writing a compiler or a language and you decide to enforce a single format, the next question is: which one? I would say for current popular languages like C, C++ or Java trying to answer such a question is a futile effort. It’s not that you couldn’t change a compiler to implement a particular style. The problem is rather that there are already too many conventions in place and you’d have civil war if anyone tried to enforce a particular format at the compiler level. If we are to take the idea of a single correct format seriously, it has to be implemented in a new, or at least not-completely-mainstream language. Haskell already has whitespace indentation built into the language. Instead of using curly braces to enclose statements in a function definition, you can use indentation in a manner similar to Python. This is built into syntax and violating the indentation rules will cause the compiler to fail.

Go takes a different approach. Instead of writing format rules into the grammar and compiler, there is a program called Gofmt which will reformat any syntactically correct Go format into the standard Go format. It can also be used as a syntax translator meaning that as the Go language changes, Gofmt can be used to automatically upgrade programs written in an older version of Go to a newer version as long as the changes can be described as syntax transformation rules. Gofmt is a powerful program and sets a high standard for language and developer oriented tools.

So will the languages of the features have strict formatting that minimize the amount the time we waste mentally translating between formats? Maybe. But I doubt that it will be considered a standard part of mainstream languages any time soon. While we have curly brace languages people will continue to expect that the various conventions of the current curly brace languages will also be applicable. Stylistically different languages like Haskell (and to some extent Go) will probably lead the way in changing the way we think of conventions and programming style. It’s interesting and a bit intimidating to see how far the bar for new programming languages has been raised in recent years, but that’s a topic for another blog post.

Revamping the ByteBaker series

Not too long ago I started writing series of posts on The ByteBaker. I started two of them: Powerful Python and Sunday Selections.

PythonPowerful Python was a series of posts about the Python programming languages and how its features make it easier for programmers to write code. As it stands now there are four posts in this series:

Python is the language that I’m most familiar with and have written the most code in. Over the last month or so I’ve been writing Python day in day out and really exercising my Python chops (as well as getting acquainted with features like generators and decorators).  Over the next few weeks I’m going to be writing more posts exploring Python and adding them to the Powerful Python series. If you regularly write code in Python or just have a passing interest, this is something you’re going to like.

The second series that I had was Sunday Selections. I try to post two to three times a week, but I didn’t want to leave the weekends completely bare. I also wanted to spend my weekends doing other things (preferably away from the computer). So I started a series where every Sunday I would post links (with brief intros) to interesting things that I had found the week before. I’ll admit that I haven’t been very stable with the post schedule, partly because I kept forgetting or losing what I had found and really didn’t want to go hunting around the intertubes for whatever it is that I liked.

Over the past few months I’ve become much better at holding onto things I find online. Using Diigo for bookmarks and Tumblr for “scrapbooking” the web I’ve been managing to keep a good record of all the wonderful stuff I’ve found (and there is a lot of it). So I’m bringing back Sunday Selections as well (starting this Sunday) so stay tuned for a steady flow of Internet-y goodness.

I’m really looking forward to writing series posts again. I feel like my writing can sometimes get either monotonous or spread all over the place without any focus. I’m hoping that the series (especially the Powerful Python series) will provide a good path for me to write articles that are coherent and progress along a definite line. Stay tuned.

Computing is still in the dark ages

Despite all the talk of Web 2.0 and the shiny multicore machines with their gigabytes of RAM and billions of cycles per second, I sometimes can’t help feel that we are still very much in the dark ages of computing. This time around my dark gloomy feelings have been brought about by this message to a mailing list which in turn was sparked off by the announcement of the Go Programming Language. As a computer user and a programmer I feel that the actual use of computers is far below their potential.

As the years go by, it seems like we keep on piling layer on top of layer while the results aren’t proportional to what we have to learn to get things done. Now, I’m not proposing that we all start writing down-to-the-metal code or force everyone to become a programmer, but things are starting to look like a mess. Web programming is an interesting development, but it adds yet another layer on top of the existing kernel, operating system, libraries and GUI toolkits. Add to that the fact all browsers are still a bit different from each other and you can start to understand why I’ve yet to make a serious foray into web programming.

But even without the web and the many formats and barely interoperating systems out there, there’s enough on the desktop to get you depressed. Start with the fact that there are currently three major operating systems out there and if you want to write a program that runs on all three of them, you don’t have an easy task. You either embrace three different toolkits and programming methodologies and maintain 3 very different codebases, or you use something like Java which works on all three, but screams non-native on each one. Even though there are languages like Python that run on all of three, it really puts me off that there is still no top-notch multiplatform GUI library. wxWidgets tries pretty hard, but if you look at the screenshots you can pretty easily that they don’t look quite right. It’s not very surprising that lots of smart developers are flocking to the web, where things in comparison are a lot smoother.

There is also the fact that programming languages, like all other pieces of major software, suck more than others. I still stand by what I said in my last post, that it’s an exciting time for language enthusiasts, but I also feel that there are some lessons we really need to learn. I’m starting to have concerns that there may not be any true general purpose language, simply because there are so many different types of problems to be solved. I think we need to start creating broader categories: a set of systems languages similar to C going in the direction of D and Go. A set of hyper-optimizing VM-based languages designed for long-running, parallel server applications (the current JVM is a good example). A set of languages for writing end-user apps that are significantly high-level, but are still compiled to pretty fast native code (maybe not C or even optimized VM fast, but better than todays Python or Ruby). I’m thinking Python in its Unladen Swallow incarnation might fill this gap.

As a programmer, the state of tools that we have to use is really quite depressing. Tools like Emacs and Vi are powerful and all, but let’s face it: we could really be having much more powerful IDE technology. We should be having full blown incremental compilation with autocompletion and support for rendering documentation for every major language out there. We should also have seamless version control with granularity down to the undo level. Every change I make should be saved and I should be able to visually browse all these changes, see what they are and restore to an older state (or commit them if I want to). We have the raw computing power needed to do all this, but yet we remain stuck doing mostly batch-style edit-compile-debug cycles and mucking around in plain text. Eclipse with its incremental compiler makes things much easier, but there’s so much more we could be using our machines for.

As a user, what irritates me is the amount of manual labor we still have to do on a daily basis. We still have to carefully name and place files so that we can file them later. I have to manually hit the save button (see version control bit above). Even with the Internet collaboration is a mess with most people throwing around emails with increasingly larger attachments. Add to that the fact that most email clients are pretty dumb pieces of software. Google Wave is a step in the right direction, if enough people get around to actually using it (and if it can integrate to some extent at least with the desktop). Also I think the web and the desktop need to be brought closer together. Ideally I would be able to sit down on any computer with a live Internet connection and have my full custom work environment (or at least the most important parts of it).

I’m fully aware that none of the things I’ve mentioned are trivial. In fact, they’re probably very hard projects that will take expert teams a good few years to complete. One day I would like to seriously work on some of the programmer-related issues, especially the IDE part. I love Emacs, but there are some parts of Eclipse I really like too. For the time being I’m going to have to make do with what I have, but I’ll be sure to keep an eye for interesting things and movements in the right direction.