Amazon’s Digital Wonderland

A few weeks ago I found myself in Seattle, WA. Contrary to popular belief, it was a rather bright and sunny few days (if somewhat chilly). Here’s an obligatory picture of the Sky Needle.

Sky Needle

Anyways, on the first day there I fought a mostly losing battle against travel-induced tiredness (I was up at 4:30 in the morning) and walked around downtown for a while, somewhat zombie-like. I spent the most of the next day in one of Amazon’s new buildings attending their first ever PhD Symposium. I got to meet Amazon employees like Swami Sivasubramanian, one of the creators of Amazon’s Dynamo database, as well as fellow graduate students like Rahul Potharaju. The day was full of interesting presentations and the breaks in between were packed with lots of cool conversations. I presented my current project, Merlin(excuse the visuals) and got some good feedback. All-in-all it was a great day, I had a wonderful time and I hope Amazon keeps having more of these research Symposia.

But that’s not what this post is about. Personally, I think of Amazon as a retailer first and a technology company second. In fact, I’ve even written a post about their exemplary customer service. Even though I’ve known about EC2 for years and have used both S3 and Glacier as personal backup, the idea of Amazon as a technology company has always been at the back of my mind. In fact, it was only while attending the symposium that I really thought about the full weight of Amazon as a technology services company.

After coming home I looked up the keynote from Amazon’s recent Re:Invent conference. The keynote shows off some of their more interesting recent technology (including new EC2 instances) as well as client technologies built on top of it (including companies like Netflix and Vimeo). I also stumbled across Dave Winer’s post on Amazon’s support of static JavaScript applications and why that’s so interesting and important.

The more I think about it, the more I like Amazon. They make incredible technology, employ lots of really smart people and have a refreshingly honest and direct business model in an industry dominated by advertising and harvesting user data. From computation, to storage, to scalable DNS, Amazon offers a suite of services that’s just about stunning in its breadth. Though I’ve had little use for their services personally (apart from Glacier for backup), I can see myself extensively using their systems and technology if I was building any of type of scalable, distributed service.

Even as I write this, I’m trying to come up with excuses for trying out more of their technology. What would I build? I honestly don’t know. But looking at the range of Amazon technologies and thinking about the possibilities reminds me of the feelings I got when I first started programming and learning about computers.

In many ways, the world has changed since I started writing code about 12 years ago. I had a lot of fun writing LOGO and BASIC programs and then hacking together little Perl scripts. Today I find myself wondering what the loosely coupled services and technologies offered by Amazon and other cloud computing services enable. I wonder if the new programmers of today, still learning on primarily single-threaded, single-box computing platforms, should be encouraged to move on to the brave new world of instantly accessible, practically unlimited computing power. I wonder what we’ll achieve if we were to take distributed, connected computation as the starting point, rather than the state of the art.

As an ending note, let’s think about Microsoft. It’s become standard to talk about Google as today’s Microsoft, but I’m starting to wonder if that title doesn’t rightfully belong to Amazon. I’m not talking about monopolistic activities or questionable business practices, but rather their similarities in making computing more popular. Microsoft’s goal (ostensibly) was to put a computer in every household. Amazon, for its part, has commoditized high-powered computing and distributed systems and made them available to people with modest budgets. I suppose the more things change, the more they stay the same.

Two weeks backing up to Amazon S3

It’s been two weeks since I started using Amazon’s Simple Storage Service in connection with Jungle DIsk as a synced backup solution for all my files. So far it’s been working great. Both S3 and Jungle Disk are great tools and they make a killer combination.

Amazon S3 lets users make ‘buckets’ which are storage units for all your files. The best metaphor is to think of them as being individual drives with their own independent directory structures. In fact, JungleDisk lets you mount them as Network Drives. I set up 3 buckets — one for my personal Subversion Repository (which contains my day-to-day use files), one for my photos and music, and one as a dump for large files like backups of software installers. On my daily work Mac Mini, I have all three buckets mounted as network drives. The media bucket is set to automatically scan and back up all my music and photos every monday morning. I set it to Monday morning because that is usually when I’m in class and my computer isn’t doing anything. My main Subversion repository lives on an older G4 that is currently in my college’s student server room. I only have the repository bucket mounted on that machine, which automatically backs up every Sunday at midnight. The third bucket is mounted only on my Mini. It is only set up as Network Drive, since I only occasionally have large files that need to be dumped. JungleDisk handles the automatic backups and makes sure that for large files, only the changed parts actually uploaded (because S3 charges for data transfer as well as storage).

On the S3 side, things are mostly smooth. I haven’t been billed yet, but I just checked my activity usage. My data upload is recorded at just over 23GB, which is roughly the amount I’ve backed up. What I’m slightly confused about is that the total storage usage only amounts to about 6.8GB and I’m being billed accordingly. I don’t have an explanation for why this is case (perhaps the storage amount used for billing is the amount of storage I used at certain days in the month) and a quick Google search didn’t reveal anything similar. I’ll keep an eye on this and will notify Amazon at the end of the month if I think that something unusual is going on.

Update: As JungleDave said on the first comment below, Amazon charges both on the basis of how much you’ve stored and for how long (which makes perfect sense). So by the end of the month, I should be getting a full bill. Thanks again to JungleDave.

I’ve had a good experience so far. You know an automated backup system is doing its job if you don’t give it a second thought after you’ve set it up. This was certainly the case with me. I hadn’t thought about my backups until I got a reader’s comment asking about how it was going. If you’re looking for a simple and cheap remote backup solution, I would certainly recommend an Amazon S3 + JungleDisk combo. The initial $20 price of JungleDisk is certainly worth it.