Save everything

The hard drive in my first computer was a mere 20GB. When we had to replace it three years later, we got one with a 40GB capacity. My current laptop (which was the first computer I bought on my own) has a 160GB hard drive. Even my pocket sixed iPod classic sports 80GB . Today you can get terabyte hard-drives for a few hundred dollars. So the question is: with all this massive storage easily available, why is the user still prompted to save their work? Why not just save everything?

In a way, even though we have 21st century storage capacity, we still have 20th century storage techniques. Filesystems of today while certainly efficient at doing their job don’t quite do the right job. After all the bulk of today’s personal data isn’t in the form of documents that can be neatly sorted into hierarchical categories. Instead most of our data is in the form of pictures, music and video, for which the easiest interface is some sort of multi-property based search rather than directory-style storage. As our data grows, metadata will become increasingly important.

But even if we have rich metadata, that’s still only going to take up a small amount of space. What I would really like to see is ubiquitous versioning. Any changes that I make to any files (including moving and renaming) should be easily viewable and I should be able to roll back to previous versions without any difficulty. Software developers have already been using robust versioning systems for decades, but I would like to see it become an inherent property of the file-storage system itself. Versioning goes hand in hand with backups, and while Apple’s Time Machine is a step in the right direction, its still got a while to go.

Another twist in the storage tale is that though local storage in the form of hard drives and flash drives is becoming dirt cheap, online bulk storage is cheaper still (and in some cases free). Unfortunately there is often quite some work to be done to get reliable online storage working seamlessly with your local machine. Like versioning, the technology is already out there, it just needs to be packaged into a convenient, always-available form.

So where do we start? I think Google Docs has shown a good starting point: instead of making the user explicitly save something, applications should just go ahead and save it anyway. If the user decides to actually keep it she can then rename it to something meaningful and move to somewhere else. Perhaps there should be some sort of garbage collection where files that were autosaved and then untouched are deleted after a certaing amount of time (after asking the user, of course). Or you could just save everything forever and only run garbage collection if disk space gets dangerously low.

Once you have a basic save-everything system, you could add versioning on top of that. I was hoping to find a versioning filesystem already in existence, but the closest to a fully operational one that I could find were Wayback and CopyFS, not quite what I’m talking about yet. ZFS shows some promise with its copy-on-write and snapshot features. Hopefully it will only be a few more years before one of the major OS makers (or an Open-source initiative) decide to bake version control into the filesystem (or at least tie them together closely). Once we have the capability to store such massive amounts of versioned data seamlessly, we need a way to find it all. WinFS would have gone a long way to solving this problem, if it had ever gotten finished. I’ve personally come to see the shelving of WinFS as one of the greatest tragedies our industry has faced in recent times. The hierarchical file structure is being pushed to the limit and WinFS would have given a good way forward. However, as personal data gets into the terabyte range, we will absolutely need filesystems that can work with rich metadata. Hopefully WinFS will be pulled out of the mothball or Apple will come up with a working solution post Snow Leopard.

Right now I’m stuck with my large hard-drives hopelessly underutilized. I’ve started trying some home-grown solutions such as putting all my documents under version control. Over the next semester at college I’ll also be experimenting with S3 and trying to run a personal backup server. Hopefully I’ll be able to put all those gigabytes to work.

Advertisements

Published by

Shrutarshi Basu

Programmer, writer and engineer, currently working out of Cornell University in Ithaca, New York.

One thought on “Save everything”

  1. Hi Shrutarshi
    Good article. I was writing on similar topic when I came across yours 🙂
    Personally I am facing similar issues. One way I solved the personal doc management (version controlled) was by using StarVault (http://www.coriolis.co.in/starvault.php). It is freely downlodable and works well for me and some of my friends.
    I believe Version controlled file systems are on their way. Also, we are working on an interesting product that offers version controlling of virtual machine images. (ref: http://www.coriolis.co.in/VmRevisionControl.php).
    Feel free to get back to me for details on any of the above.
    Thanks
    Barnali

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s