Python as a glue language

I’ve spent the better part of the past few weeks redoing much of the architecture for my main research project. As part of a programming languages research group I also have frequent discussions on the relative merits of various languages. Personally I’ve always liked Python and used it extensively for both professional and personal projects. I’ve used Python for both standalone programs and for tying other programs together. My roommate likes Bash for his scripting needs but I think Python is a better glue language.

My scripting days started with Perl about 6 years ago but I quickly gave up Perl in favor of Python. I don’t entirely remember why, but I do remember getting the distinct impression that Python was much cleaner (and all-round nicer) than Perl. Python has a lot of things going for it as a scripting language – a large “batteries included” standard library, lots of handy string functions for data mangling and great interfaces to the underlying OS.

Python also has decent features for being a general purpose language. It has a good implementation of object-orientation and classes (though the bolts are showing), first class functions, an acceptable module system and a clean, if somewhat straitjacketed syntax. There’s a thriving ecosystem with a centralized repository and a wide variety of libraries. I wish there were optional static types and declarative data types, but I guess you can’t have everything in life.

Where Python shines is at the intersection of quick scripting and full-fledged applications. I’ve found that it’s delightfully easy to go from a bunch of scripts lashed together to a more cohesive application or framework.

As an example I moved my research infrastructure from a bunch of Python and shell scripts to a framework for interacting with virtual machines and virtual networks. We’re using a network virtualizer called Mininet which is built in Python. Mininet is a well engineered piece of research infrastructure with a clean and Pythonic interface as well as understandable internals. Previously I would start by writing a Python script to instantiate a Mininet virtual network. Then I would run a bunch of scripts by hand to start up virtual machines connected to said network. These scripts would use the standard Linux tools to configure virtual network devices and start Qemu virtual machines. There were three different scripts each of which took a bunch of different parameters. Getting a full setup going involved a good amount of jumping around terminals and typing commands in the right order. Not very tedious, but enough to get annoying after a while. And besides I wouldn’t be much of a programmer if I wasn’t automating as much as possible.

So I went about unifying all this scripting into a single Python framework. I subclassed some of the Mininet classes so that I could get rid of boilerplate involved in setting up a network. I wrapped the shell scripts in a thin layer of Python so I could run them programmatically. I could have replaced the shell scripts with Python equivalents directly but there was no pressing need to do that. Finally I used Python’s dictionaries to configure the VMs declaratively. While I would have liked algebraic data types and a strong type system, I hand-rolled a verifier without much difficulty. OCaml and Haskell have definitely spoiled me.

How is this beneficial? Besides just the pure automation we now have a programmable, object-oriented interface to our deployment setup. The whole shebang – networks, VMs and test programs – can be set up and run from a single Python script. Since I extended Mininet and followed its conventions anyone familiar with Mininet can get started using our setup quickly. Instead of having configurations floating around in different Python files and shell scripts it’s in one place making it easy to change and remain consistent. By layering over and isolating the command-line interface to Qemu we can potentially move to other virtualizers (like VirtualBox) without massive changes to setup scripts. There’s less boilerplate, fewer little details and fewer opaque incantations that need to be uttered in just the right order. All in all, it’s a much better engineered system.

Though these were all pretty significant changes it took me less than a week to get everything done. This includes walking a teammate through both the old and new versions and troubleshooting. Using Python made the transition easier because a lot of the script code and boilerplate could be tucked into the new classes and methods with a few modifications. Most of the time was spent in figuring out what the interfaces should look like and how they should be integrated.

In conclusion, Python is a great glue language. It’s easy to get up and running with quick scripts that tie together existing programs and do some data mangling. But when your needs grow beyond scripts you can build a well-structured program or library without rewriting from scratch. In particular you can reuse large parts of the script code and spend time on the design and organization of your new applcation. On a related note, this is also one of the reasons why Python is a great beginner’s language. It’s easy to start off with small scripts that do routine tasks or work with multimedia and then move on to writing full-fledged programs and learning proper computer science concepts.

As a disclaimer, I haven’t worked with Ruby or Perl enough to make a similar claim. If Rubyists or Perl hackers would like to share similar experiences I’d love to hear.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s