Aiming for the Cloudtop

In my day-to-day work I end up using a number of physical machines and all three major operating systems. I do most of my work on Linux, but I use Windows machines for all my electrical engineering work (mostly MATLAB and a few design programs). I use my Mac Mini for my music and videos and if I need to use a computer at the library I prefer using their iMacs. I often find myself needing to transfer files between machines (especially if I need to print something). Even the school gives students a gigabyte of space on a network drive, I never got it to work on Linux. In the past I’d use a combination of email and USB drives to moce stuff around, but a few weeks ago I started using Dropbox and I’m quite happy with it.

I haven’t been able to quite pin down what makes Dropbox successful when other similar applications haven’t done so well. I think a large part of the reason is that Dropbox seamlessly melds the cloud and the desktop. They have desktop apps for Windows, OS X and Linux that all actually work. The way I use it Dropbox acts as simple folders on my local machines and are immediately synced with the corresponding folders on all the other machines. And whenever I’m at a computer where I can’t install Dropbox, I can just use their web interface (which is well done and very frictionless). It also helps that Dropbox gives me 2GB completely for free. I have friends who are pushing that limit already, but since I just put stuff like homework I need to print off, that should last me a while.

Part of the reason for why Dropbox feels so easy to use (and I becoming very popular) is that it seamlessly fits in to the way you work. Dropbox doesn’t sell itself as a backup or some kind of complex, high powered auto-syncing solution. It does one thing well — keeps a folder exactly the same on all your machines. You don’t have to manually upload files to a webservice or specify which folders you want to sync and what not to. You just put everything in one place (your Dropbox) and rest assured that it will be the same on whatever computer you’re on.

As Anil Dash says, the key to apps like Dropbox and Evernote (which I don’t use myself) is that they inhabit a sort of “in-between” space that exists across both the web and the desktop. They don’t try to deny to deny the presence of the desktop by offering an all-powerful web UI. Instead they embrace the idea that you’ll be using multiple heterogeneous platforms. The web is just yet another interface. They also offer an API meaning that developers can’t extend it for purposes that the original service provider doesn’t support. Another aspect of these apps that I find refreshing (as compared to Delicious for example) while they allow for sharing and a certain social environment, it isn’t central to the service’s operation.

I’m hoping that these sort of “cloudtop” services get more traction as time progresses. In particular, I’d love to see things like user preferences be synced as well as folders and data. On a parallel note, I’d like to export services already present in applications get streamlined as well. As an example, iPhoto allows for export plugins so that you can directly upload your photos to places like Flickr, Picasa or Facebook. Unfortunately the upload process generally blocks the whole app instead of happening seamlessly in the background. I think we’re getting closer to a future where all our data is available seamlessly everywhere. I hope there isn’t too much fragmentation in the area as it would a pain to have to use half-a-dozen different apps to keep my data in sync (especially if they’re all using a different way to do it). This market is still in its infancy but apps like Dropbox are leading the charge and they promise to make computing much easier all involved.

Two weeks backing up to Amazon S3

It’s been two weeks since I started using Amazon’s Simple Storage Service in connection with Jungle DIsk as a synced backup solution for all my files. So far it’s been working great. Both S3 and Jungle Disk are great tools and they make a killer combination.

Amazon S3 lets users make ‘buckets’ which are storage units for all your files. The best metaphor is to think of them as being individual drives with their own independent directory structures. In fact, JungleDisk lets you mount them as Network Drives. I set up 3 buckets — one for my personal Subversion Repository (which contains my day-to-day use files), one for my photos and music, and one as a dump for large files like backups of software installers. On my daily work Mac Mini, I have all three buckets mounted as network drives. The media bucket is set to automatically scan and back up all my music and photos every monday morning. I set it to Monday morning because that is usually when I’m in class and my computer isn’t doing anything. My main Subversion repository lives on an older G4 that is currently in my college’s student server room. I only have the repository bucket mounted on that machine, which automatically backs up every Sunday at midnight. The third bucket is mounted only on my Mini. It is only set up as Network Drive, since I only¬†occasionally¬†have large files that need to be dumped. JungleDisk handles the automatic backups and makes sure that for large files, only the changed parts actually uploaded (because S3 charges for data transfer as well as storage).

On the S3 side, things are mostly smooth. I haven’t been billed yet, but I just checked my activity usage. My data upload is recorded at just over 23GB, which is roughly the amount I’ve backed up. What I’m slightly confused about is that the total storage usage only amounts to about 6.8GB and I’m being billed accordingly. I don’t have an explanation for why this is case (perhaps the storage amount used for billing is the amount of storage I used at certain days in the month) and a quick Google search didn’t reveal anything similar. I’ll keep an eye on this and will notify Amazon at the end of the month if I think that something unusual is going on.

Update: As JungleDave said on the first comment below, Amazon charges both on the basis of how much you’ve stored and for how long (which makes perfect sense). So by the end of the month, I should be getting a full bill. Thanks again to JungleDave.

I’ve had a good experience so far. You know an automated backup system is doing its job if you don’t give it a second thought after you’ve set it up. This was certainly the case with me. I hadn’t thought about my backups until I got a reader’s comment asking about how it was going. If you’re looking for a simple and cheap remote backup solution, I would certainly recommend an Amazon S3 + JungleDisk combo. The initial $20 price of JungleDisk is certainly worth it.

Amazon S3 as a personal backup service

I’ve been interested in online backups of my data for a long time. At first I started by simply keeping my data on a free FTP server. With the rise of Web2.0 easy to use online storage, I tried a number of solutions including Box.net. I also made a backup to my Gmail account once using Gspace. When I tried web office suites from Zoho and Google, I also updated most of my documents. But time goes by, and I never really managed to stick to a single solution. I’ve also made DVD backups of my data occasionally but never with any regularity. A few months ago, I decided to drop using third party solutions and instead set up a personal Subversion server on an older Mac that I had. This system has been pretty effective at keeping my day to day to files backed up and synced across multiple machines.

While the home Subversion system does a good job of keeping backups of my most important documents, I don’t put everything in my SVN repository. Mainly, my music and my photos are mostly on a single machine (my Mac Mini). I’ve recently started looking at an efficient way to create large scale backups of all my files at less regular intervals than my SVN commits. I considered using an external hard drive, but even if I backed up all my files, I would really be using only a small amount of the space available. Plus, being a college student, I didn’t want another piece of hardware to lug around. Also, I wanted a backup that I could access no matter where I was — something online.

I would look at the Web 2.0 solutions, but none of them are cost-effective for the large amounts of data I plan on backing up. My music weighs in at around 16GB and will keep growing. My photos are about 2GB and will also keep growing. I would also like to backup copies of software discs I have paid for. There aren’t many of them, but still a few gigabytes worth. Add to that are all my regular documents and other files. All told, I’m looking at something in the range of about 25GB for a first time backup and growing over time. Considering that I’m a starving college student, the cheaper the better.

Enter Amazon Simple Storage Service

Amazon S3 is an industry standard storage solution that can easily handle many terabytes of storage and bandwidth. Whats also important is that it is very cheap. For the amount of storage I’ll be using, I’ll be spending 15 cents per GB per month and 10 cents to upload each gigabyte. Here’s a rough calculation for what my costs will be like:

Initial Backup: ($0.15 perGB * 25GB) + ($0.10GB *25GB) = $3.75 + $2.50 = $6.25

Monthly Running Cost: ($0.15 perGB * 25GB) + ($0.10GB *2GB) + ($0.17 * 1GB) = $4.12

For the running cost I estimated an upload of 2 GB a month and 1GB download. Though this is probably (especially the download) more than what I actually will be using, its should be a fairly good estimate since the amount stored will be gradually going up. So for less than the price of a regular lunch I’ll be able to keep all my important files safely backed up in a safe online location.

The Catch

The catch is that Amazon S3 is not meant to be a storage solution for general users. It’s an enterprise quality system made to plug directly into a high performance online service. As a result S3 offers a fully functional API to write programs around it, but there’s no easy to use interface for users to manage their uploads. Luckily there are a number of third party tools available that fill the need. Here’s a somewhat outdated list of some available tools. The client that I’ll be using is called JungleDisk. It’s a wonderful cross-platform tool that maps your S3 storage as a storage drive on your computer. This means that you can use it as you would have storage disk attached to your machine, and you can also run scripts that automatically backup data to your S3 from other parts of your computer. JungleDisk also provides its own automation facilities to regularly backup your data. No more having to remember to backup once a month.

JungleDisk costs $20.00, but I think that’s an acceptable price, considering that you can install on as many machines as you like (including Mac, Windows and Linux systems) and you get free lifetime updates, meaning you never pay for anything again. For a dollar a month you can get the JungleDisk Plus service that lets you access your files via a web interface, allows resuming uploads if they are interrupted and lets you upload only changed parts of large files (hence saving upload costs). At this point, I don’t think I’ll have a need for Plus, but it’s a good choice if you travel a lot and plan on using S3 as your primary syncing mechanism.

Starting next month…

I’ll be backing up to S3 regularly via Jungle Disk. I plan on making the initial transfer over this weekend (while recovering from Halloween parties). Before that I need to get my files organized and decide on what I will backup and what I won’t. I’ll post a followup once I’ve been using the service for a while to see if it really is worth the cost.