There’s this new show on A&E TV called Hoarders. From the show’s website, each episode “is a fascinating look inside the lives of two different people whose inability to part with their belongings is so out of control that they are on the verge of a personal crisis”. It’s an interesting show about people, who quite simply, have too much stuff. I’ve watched a few episodes, it’s somewhat repetitive, and strangely addictive in the way that only these things can be. Though I never gave the show much thought after I finished watching an episode, a few days ago I had a strange epiphany: I might be a data hoarder.
Here’s the gist of it: I’m afraid of losing data. It’s not that I have a ton of important stuff which I use regularly, in fact much of what I have on hard drive (besides my music and pictures) are things I will probably never actively use again. What I’m actually afraid of is that someday I’m going to want some file (or some specific version of some file) and I won’t be able to find it. Now even if I do have the file, I might not find it due to poor organization and data retrieval systems, but that’s a matter for another blog post. What I’m afraid of is pure, simple data loss: I start working on a project, which I only have one copy of, and something happens to that one copy, whether it be a hard drive crash, or just human error and accidental deletion. And then I have to start all over again, with no real idea of what I did the first time.
Now, thanks to technology I’ve been able to deal with my hoarding instincts, without having dozens of different versions littering my hard drive and doing manual backups every week. At the heart of my system is Git, which lets me keep everything that’s important to me in strict version control. It also lets me easily keep files in sync between different machines, which is a problem I still haven’t completely solved (especially for public machines). By keeping things in sync between three different machines, I have backups in three completely different (as in physically separate) places.
The second thing that keeps my data in control is Amazon’s S3, with JungleDisk. Once a week, this ships all my Git repositories, music, pictures and various software installers to Amazon’s massively distributed storage servers for less than $5 a month. The choice was either this, paying as I go, or buying a terabyte hard disk. Personally I think made the right choice, since my backups are not only safe and secure in a far away place, and I didn’t have to shell out a lumpsum in one go.
Now all this was fine, but lately I’ve been having this urge to record everything. And I mean everything. There are all my tweets and dents which go out into the ether of cyberspace, which I might someday want to have on record. There are all the websites I visit and most recently all the music I listen to and the movies I see. In a perfect world, I would have all my tweets saved to a Git repository and all the DVDs I watch and music I listen to would be instantly ripped and placed in cold storage in an Amazon bucket (or a terabyte disk). And this may not be a good thing, for the reason that I wouldn’t see most of the DVDs for the second time and I have no idea why I would want to save my tweets (or ever look them up).
In the past week I’ve been sorely tempted to actually buy a terabyte hard drive and start manually ripping all the DVDs that I watch. I even went so far as to install Handbrake on my Mac Mini. I’ve been trying very hard to override the temptation with my logic (and laziness). It’s been hard but I’ve been successful so far. Underneath this is perhaps a more important issue: how much data is enough and how safe is safe enough? Keeping my own created data completely backed up in multiple places I think is perfectly acceptable, but I think that ripping all the DVDs is borderline obsessive. It would be an interesting thing too, and might be worth something in terms of street geek cred, but it’s not something that I can seriously see myself doing (and it’s possibly illegal too).
So there you have it, I’m a data hoarder, or at least I have data hoarding tendencies. No, I don’t need an intervention yet, and I don’t need treatment. In fact, I think I’m at the point where I’m reliably saving and backing up everything that I create (that’s more than 140 characters) but not randomly saving everything that I come into contact with. Maybe in another place and time I will actually be saving all my movies as well, but that will probably in terms of actually buying DVDs and having a properly organized collection, instead of borrowing them from the library. For the time being, I trust my digital jewels to Git, three computers and Amazon S3.