My confession: I’m a data hoarder

There’s this new show on A&E TV called Hoarders. From the show’s website, each episode “is a fascinating look inside the lives of two different people whose inability to part with their belongings is so out of control that they are on the verge of a personal crisis”. It’s an interesting show about people, who quite simply, have too much stuff. I’ve watched a few episodes, it’s somewhat repetitive, and strangely addictive in the way that only these things can be. Though I never gave the show much thought after I finished watching an episode, a few days ago I had a strange epiphany: I might be a data hoarder.

Here’s the gist of it: I’m afraid of losing data. It’s not that I have a ton of important stuff which I use regularly, in fact much of what I have on hard drive (besides my music and pictures) are things I will probably never actively use again. What I’m actually afraid of is that someday I’m going to want some file (or some specific version of some file) and I won’t be able to find it. Now even if I do have the file, I might not find it due to poor organization and data retrieval systems, but that’s a matter for another blog post. What I’m afraid of is pure, simple data loss: I start working on a project, which I only have one copy of, and something happens to that one copy, whether it be a hard drive crash, or just human error and accidental deletion. And then I have to start all over again, with no real idea of what I did the first time.

Now, thanks to technology I’ve been able to deal with my hoarding instincts, without having dozens of different versions littering my hard drive and doing manual backups every week. At the heart of my system is Git, which lets me keep everything that’s important to me in strict version control. It also lets me easily keep files in sync between different machines, which is a problem I still haven’t completely solved (especially for public machines). By keeping things in sync between three different machines, I have backups in three completely different (as in physically separate) places.

The second thing that keeps my data in control is Amazon’s S3, with JungleDisk. Once a week, this ships all my Git repositories, music, pictures and various software installers to Amazon’s massively distributed storage servers for less than $5 a month. The choice was either this, paying as I go, or buying a terabyte hard disk. Personally I think made the right choice, since my backups are not only safe and secure in a far away place, and I didn’t have to shell out a lumpsum in one go.

Now all this was fine, but lately I’ve been having this urge to record everything. And I mean everything. There are all my tweets and dents which go out into the ether of cyberspace, which I might someday want to have on record. There are all the websites I visit and most recently all the music I listen to and the movies I see. In a perfect world, I would have all my tweets saved to a Git repository and all the DVDs I watch and music I listen to would be instantly ripped and placed in cold storage in an Amazon bucket (or a terabyte disk). And this may not be a good thing, for the reason that I wouldn’t see most of the DVDs for the second time and I have no idea why I would want to save my tweets (or ever look them up).

In the past week I’ve been sorely tempted to actually buy a terabyte hard drive and start manually ripping all the DVDs that I watch. I even went so far as to install Handbrake on my Mac Mini. I’ve been trying very hard to override the temptation with my logic (and laziness). It’s been hard but I’ve been successful so far. Underneath this is perhaps a more important issue: how much data is enough and how safe is safe enough? Keeping my own created data completely backed up in multiple places I think is perfectly acceptable, but I think that ripping all the DVDs is borderline obsessive. It would be an interesting thing too, and might be worth something in terms of street geek cred, but it’s not something that I can seriously see myself doing (and it’s possibly illegal too).

So there you have it, I’m a data hoarder, or at least I have data hoarding tendencies. No, I don’t need an intervention yet, and I don’t need treatment. In fact, I think I’m at the point where I’m reliably saving and backing up everything that I create (that’s more than 140 characters) but not randomly saving everything that I come into contact with. Maybe in another place and time I will actually be saving all my movies as well, but that will probably in terms of actually buying DVDs and having a properly organized collection, instead of borrowing them from the library. For the time being, I trust my digital jewels to Git, three computers and Amazon S3.

Why Unladen Swallow is important to Python’s future

Unladen Swallow is a Google-led (but not Google owned) project that began earlier this year with the purpose of providing a compatible Python implementation that is at least 5 times faster than the current CPython implementation. Though it hasn’t really gotten a whole lot of attention (and is probably less well known than projects like Jython or IronRuby), it is still a very important project in its own right.

The CPython implementation currently works by compiling the Python source code for a custom virtual machine which then interprets them. Unladen Swallow’s basic approach is to replace the custom virtual machine and interpreter by a Just-in-time compilation strategy using the LLVM compiler infrastructure. The Python source code is compiled to bytecode which is then transformed to the LLVM intermediate representation. Any ‘hotspots’  in the code (functions called more than 1000 times) are then compiled by the LLVM JIT into native machine code and executed. This strategy makes the execution faster than with the bare custom VM approach. The 2009-Q2 release emits correct native machine code via the LLVM, but is not yet optimized for performance. As the 2nd quarter benchmarks show, significant progress has already been made, and as the JIT is further exploited and adapted more improvements are likely.

Even if the goal of Unladen Swallow were limited to just performance, that would be a big plus. Programs running on the Java Virtual Machine exploit its state-of-the-art JIT technology to allow long-running programs like servers and databases to gain considerable performance improvements. The various Python web frameworks like Twisted and Django would benefit from a JIT-based speedup. This in turn would make a Python an even more viable alternative in the web sphere. I personally think Python is a vastly superior alternative to Java and would love to see it be taken for more performance intensive projects.

However, the goal of Unladen Swallow isn’t just raw speed. If you look at the rest of the Project Plan you’ll see references to making more far-reaching changes to the Python internals. Simply using an LLVM-based backend opens up the possibility of having better inter-language cooperation, with added benefits of a native code generator. That would make it easier for programmers to combine their favorite programming languages to write a program while being able to ship a single, coherent binary.

Perhaps an even more exciting possibility is that of being able to do away with the infamous Global Interpreter Lock. The GIL is probably Python’s most well known necessary evil (with the possible exception of the whitespace thing, depending on who you ask). It enforces thread safety, keeping but at the cost of limiting the amount of concurrency that can be taken advantage of. As the number of cores per chip keep increasing, being able to use them efficiently will become a necessary part of any popular programming language. The aim for Unladen Swallow is to eventually remove the GIL entirely and implement a better garbage collection system based on work done at IBM.

Unladen Swallow is still a very experimental effort, but it is clearly a very serious project moving at a good pace with some very good people working on it. Eventually parts of it may be folded back into the core CPython implementation.  Google is already heavily invested in Python and as a result will certainly be making sure that the project comes to fruition and is sustained into the future. When Unladen Swallow does become production ready (I would say in a year or so at the most) it’s going to place Python as a powerful competitor in the industrial/infrastructure computing scene. As the changes become available to the general public, whether in the form of prebuilt binaries or source merges with CPython, it will make life better for Python programmers everywhere. Having a dynamic, flexible programming language which offers great performance to boot will make it easier for programmers to leave behind languages like C, C++ and even Java for a platform that makes application development smoother and faster. Here’s looking forward to the flight of the Unladen Swallow.

The UNIX Philosophy and the common man

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

The UNIX philosophy according to Doug McIlroy, inventor of UNIX pipes

The UNIX philosophy is the classic computer technology example of keeping it simple. It’s the central idea behind the UNIX operating system, which is probably the oldest operating system to still be in regular use (in some form or the other). Conceptually, it’s a wonderful idea: instead of writing programs to do a bunch of different things at some mediocre level, write each program to do one thing well. The input and output to these programs should be in the form of simple text (in the form of flat files, user interactions or input/output for other programs). These programs can them be tied together using pipes and simple shell scripts to achieve a variety of flexible, but complex functionality.

As a regular terminal user, I employ the UNIX philosophy every time I want to quickly search a file or some command’s output without having to see all of it. I use it to makes my life easier by automating lots of tiny little tasks that I do over and over. However, at the same time, I use and love GNU Emacs, which in many ways is the antithesis of the UNIX philosophy. Emacs encourages you to make the editor your home and the Elisp embedded language lets you create all sorts of applications right inside the editor. The classic joke goes that Emacs is really an operating system squeezed into a text editor.

Leaving behind the power user for a moment, much of software seems directly opposed to the UNIX philosophy. A word processor lets you edit text, format it, view it, create simple graphics, insert footnotes, endnotes, bibliographies and print to either electronic or paper formats. With the explosion of the web and especially social networks, things have become even more complicated. Facebook is a giant combination of email, IM, bulletin board, forum, image gallery and general purpose application platform. Considering that Facebook is immensely more popular than the UNIX command line, I have to wonder, is there something about the UNIX philosophy that makes it unattractive to the common user?

As is the case with most similar questions, the answer is both yes and no. One of the ideals behind the UNIX philosophy is uniformity: having one standardized way to do or present something is much better than having dozens of different ad hoc methods. Look closely enough and you’ll see that a lot of popular software adheres to this rule. People like Apple and OS X because everything is neatly tied together. Office suites are useful because they make it easy to share data from one application to the other without having to worry about compatibility and such. Facebook approaches this in a different way: the interface, especially the mini-feed, gives you all the information you need without having to look for it. In both cases, it’s a question of reducing the users mental overhead in accomplishing a specific task.

Even if the UNIX philosophy and good modern software might share same core characteristics, there is still a lot that sets them apart. In particular, the UNIX philosophy is meant to expose power and responsibility to the user. Need to rename dozens of files according to some pattern? There’s a UNIX utility to look up files in a directory, there’s one to rename a file and there are ones to recognize patterns. It’s up to you as a user with at least a passable understanding of shell scripting to whip up something that will get the job done (in less time than it would take to do manually). The people who feel most at home doing things like thing would even go a bit out of their way to learn scripting, if they knew it would save time in the long run. However, this isn’t how most users work. For better or for worse, most users would rather do one of two things: either manually rename the files (something they’re comfortable with) or look for some third party utility which does what they want. While old school tools require some active learning on the users’ part, modern software works hard to lower the bar: it’s dead easy to copy a graph from Excel to Word.

Perhaps part of the problem is that ‘common man’ is an extremely vague term, especially when applied to computer users. There are people who tinker for the sake of tinkering, those who tinker to get the job done, those who simply want to get things over with, those who couldn’t care less, those who would rather be using pen and paper, the list could go and on. Drawing an arbitrary line between power users and everyone else is a gross oversimplification (the number of holy wars even in the power user camp should make that clear). However there are some lessons that can be learned particularly with regard to good software. Good software offers few surprises and lowers the mental overhead needed to get the job done. Something that has a clear, consistent interface is always better than something that is complicated and convoluted. There is also a delicate balance between exposing functionality and confusing the user. Good users provide just enough functionality upfront but makes it easy to get to more advanced features. The command line achieves this by requiring the user to have read the manual to some extent before getting started. Good GUIs strike this balance well, but it’s very easy to mess it up completely.

While this article might have seemed somewhat esoteric, I think it’s important to keep these lessons in mind whenever you’re writing software to be used by a human audience. Crafting good interfaces and choosing the right features for a software product isn’t easy, especially for beginners, but it is something that can be learned mainly by keeping open eyes and an open mind. Happy Hacking.

6 interesting keyboard designs

I make no secret of the fact I’m kinda picky about keyboards, since I spend a large part of my day typing away. I also think that there should be mandatory college level typing classes, but that’s just me. My personal favorite right now is a simple Dell multimedia keyboard because I like the feel of the keys and partly because I got it free.

Though I need to have a keyboard that I can type comfortably on, I’m not really one for the more interesting keyboard designs out there. I use a simple Dell multimedia keyboard or my laptop’s keyboard. The ones featured below are keyboard designs which are readily available and which I’m sure some people swear by, but ones that I can’t see myself using in the near future.

The Das Ultimate

This might just be the most sane design of the ones I’m going to look out. It claims to be the best keyboard on the planet, but what grabs your attention right off the bat is the fact that the keys are completely blank. Needless to say, this keyboard isn’t for anyone but the most experienced typist. I’m a fairly competent typist, but I’m not that good yet. At $130 a pop, you’d better be doing a whole lot of typing.

The Das Ultimate Keyboard

The Das Ultimate Keyboard

From a technical standpoint the Ultimate’s claim to fame is that it doesn’t use a membrane key system like most modern keyboards but rather has gold-plated mechanical keyswitches like the legendary Model M. I don’t think I’ve ever actually used a mechanical  keyboard so I can’t really comment about it, but if you browse the web, you’ll soon find many testaments to its superiority.

The Logitech diNovo

This is actually a keyboard that I considered buying at one point. It’s a feature packed wireless keyboard with a very sleek design. It sports a trackpad so that you can go without a mouse if you really wanted to and also makes for easy scrolling. Unlike many wireless keyboards, it’s reachargeable so you don’t have to worry about replacing batteries. It’s more expensive than the Das Ultimate at about $180 ($135 at Amazon) and I have no idea about the typing experience. I looked into getting one earlier this year, but the high price and it’s large size (which makes it unweildy at a small dorm room desk) meant that it wasn’t really a justifiable purchase.

Logitech DiNovo

Logitech DiNovo

The True-Touch roll-up keyboard

If I remember correctly, this keyboard got a few seconds of screen time in Die Hard 4. At under $25 dollars it’s certainly the cheapest the keyboard on the list, but I seriously doubt I’d have a enjoyable time typing on it. Though it’s a nice idea, I can’t see myself ever actually needing to use one. If I’m on the go, I’ll either have access to a full computer (with keyboard and mouse) or be carrying a portable computer. I suppose if you had an iPhone like device with a USB port you might be able to hook this up to it and get a better typing experience on the go, but that seems just a little too nerdy even for me.

True Touch Rollup Keyboard

True Touch Rollup Keyboard

Optimus Maximus

Most keyboards are QWERTY layouts, but it’s not too hard to change your keyboard layout in software to something like DVORAK if you want to. It’s a bit harder to get your hands on a keyboard that has a non-standard layout that you want. The Optimus Maximus keyboard solves this problem by having each key be a fully programmable tiny OLED screen. So you can assign basically any character to any key and change your layout to better suit your needs. You can also change to more esoteric layouts like Arabic, Greek or Hiragana. There’s a small panel of extra multimedia keys that can be similarly customized to be custom app launchers. I think it’s a good concept, but at an approx $1600 (from their website) I’d rather get a new MacBook with that money.

The Optimus Maximus

The Maltron Keyboard

All the keyboards I’ve listed above have been kinda quirky, but still look like normal keyboards. The Maltron however is a special ergonomic keyboard that has a special split-bowl design. I’ve personally never suffered from any sort of RSI (I make it a point to take regular breaks) but I know that there are people who swear by the effectiveness of bowl designs. It’ll certainly take some getting used to, but it might just be worth if it means saving your wrists.

Maltron Ergonomic 3D keyboard

Maltron Ergonomic 3D keyboard

The Virtual Keyboard

The strangest keyboard out there isn’t really a keyboard at all. It’s a laser projector the size of a cigarette lighter that projects the image of a keyboard onto any flat surface in front of it. It certainly packs some serious geek cred, but I’m skeptical of its utility as a general purpose typing device. I really think keyboards need to have some semblance of tactile feedback even if it’s minimized like with the newer Apple keyboards. It’ll be interesting to see if something like this takes off and how it’s put to use.

Laser Virtual Keyboard

Laser Virtual Keyboard

All of the above are fully functioning, working keyboards. They may not be what you’re used to, but they’re not toys either. With the possible exception of the Optimus Maximus, none of them are really too highly priced. There are a number of other designs that I didn’t cover, especially the various split and ergonomic designs. If you have real experience using any of these or have a personal favorite design that I haven’t listed, do let me know in the comments.

Emacs and the art of Lightsaber Combat

Did you know that the Wookiepedia entry on Lightsaber Combat is considerably larger than the Wikipedia entry on modern warfare? In fact, I remember there being quite a fuss a few years ago about the fact that the Lightsaber Combat entry (then on Wikipedia proper) dwarfed the modern warfare entry (which at the time was just about a page long). It might seem rather frivolous to have collected so much information on a purely fictional weapon, but I think the reason that the entry is so big is in part due to the fact that the idea of the lightsaber is a very real and powerful one.

Jedi Master Mace Windu

Jedi Master Mace Windu

The core idea behind the lightsaber  is that though learning to use one requires a large amount of dedication and time commitment, in properly trained hands it can be extremely powerful. Conversely, if you try to use without the proper tutelage (and Force potential) you’re most likely to maim or kill yourself. The steep learning curve and the skill you have to acquire before being able to use one properly ensures that by the time you do master it, the lightsaber is much more powerful than something like a blaster, which is easier to use at the cost of being less flexible than a lightsaber in skilled hands. After all, Mace Windu totally owned Jango Fett.

Coming back to more mundane topics, this same concept holds for my favorite programming tool: the Emacs text editor. To demonstrate what I mean, here is a scientifically accurate graph of the learning curves of some popular text editors:

Text editor learning curves

Text editor learning curves

Though this is a bit of a joke, there’s some about of truth behind it. For simple editors like Notepad and Nano, there isn’t really all that much to learn and you quickly reach a stable plateau in terms of your editing capabilities. They’re not bad tools, but they’re not powerful either. You can interpret the other curves in a similar, though since I haven’t used Visual Studio or Vi extensively, I can’t comment on how accurate they are. Emacs, however, is kinda weird. It’s probably not as weird as the picture shows, but it’s not too far off either.

When starting off with Emacs, things can be a bit confusing. The keybindings aren’t what most people are used to and it doesn’t really have the point and click interface that you’d find in a modern IDE. Just as the Lightsaber isn’t like a normal sword, Emacs isn’t like a normal text editor. Sure, you’re not going to physically maim yourself with Emacs, but it’s possible to quickly get lost if you start hitting keystrokes that you’re used to from other programs. If you’re a little careful and just skim the manuals, it’s not really all that hard to get on your feet. I would venture that even with a lightsaber you can avoid cutting your limbs off with somef careful attention.

It’s after that point that things get interesting. This is when you hit the funny spiral part of the graph. It’s possible to use Emacs just like a plain simple text editor if you find the curve is too steep, but if you do that you’ll probably lose out on the best reasons to use Emacs. You could use a Lightsaber just like you’d use a normal knife or sword, but then you’ll miss out on all the cool acrobatic maneuvers and Force tricks that Jedi knights can pull off. I’ll venture to say that what the Force is to Lightsaber combat, Emacs Lisp is to the Emacs editor.

Emacs lisp is emacs’ inbuilt programming language. It may not be the most easy to use or powerful programming language out there, but it is what gives Emacs most of it’s power. It allows you to extend and customize Emacs almost as much as you’d like. It’s allowed the creation of state-of-art XML editing modes, code browsers, and even file browsers and email programs right inside Emacs. If you want to master Emacs and unlock its potential, you must gain knowledge of Emacs Lisp. Again, just like Force training and advanced lightsaber techniques, it’s a skill that takes sustained training and that not many developers actually ever master. There are lots of programmers, but not all of them make the effort or have the inclination to learn tools as powerful as Emacs. Just as a trained Jedi master can use a lightsaber to quickly clear his path of obstacles, whether they be droids, humans, aliens or giant metal bulkheads, an experienced Emacs hacker can write Lisp code to create customized functions (or lash together existing ones) to help solve the trickiest of problems.

Emacs has an uncanny ability to bend itself to the will of its user. By virtue of being a programmable platform, Emacs actively encourages users to write their own little utilities and make modifications to make Emacs their very own editing powerhouse. At the same time there is a very active open source community devoted to building tools of various sizes on top of Emacs. Just as young novice Jedi can learn directly from the great masters, the aspiring Emacs user can easily tap into the collective consciousness of the Emacs community via such excellent resources as the Emacs Wiki. I’m probably not far off the mark when I say that most users’ journeys to Emacs nirvana start by picking out bits and pieces of other peoples configurations and extension. This may not be quite the same as the one-on-one apprenticeship a Jedi enjoys, but there’s certainly no lack of opportunities to learn.

Again, ultimate editing power comes at the cost of significant investment of time and mental resources. Learning Emacs may not require monastic discipline and rigorous training from an early age, but it does require paying more attention to your tools and editing habits than most programmers would care to invest. Emacs comes from a time when writing your own software was par for the course for any computer user. There is an implicit assumption that the user will have an exploratory frame of mind and not be turned off at the thought of peeking under the hood and getting their hands dirty. Like the lightsaber, it requires a fearlessness and bravery that is tempered by experience and the will to continually observe and improve on one’s habits and abilities.

Anakin Skywalker as a trained Jedi

Anakin Skywalker as a trained Jedi

I could go on for quite a bit making increasingly poetic comparisons, but I think now is a good point to stop. Though all this talk of discipline and Emacs Lisp and maiming might sounds intimidating, behind it all is the simple idea that no matter what tools you use, to get the most out of them it’s worth it to put in some time and effort in learning to use them well. Whether your weapon of choice be a lightsaber sheathed in Force energy or a text editor wrapped in layers of custom code, the efficiency and power boost you get from some study and practice will pay back the investment many times over.