The Big Problem: My mother has finished a good few weeks of heavy-duty research at her lab and now wants to write up her results as a publishable paper. Not only does she have to bring together all her raw lab data, there are also lots of references to previous work. Her raw data is mostly in the form of pictures and Word files containing lab results. The references are mostly PDFs and the occasional saved webpage.
The Solution: Like any other good scientist, she decides to break down her work into a number of steps. First she has to collect all her data. Then she has to write everything in a massive Word document, and finally put together references and send it off to an editor. Luckily, my mother does not use a simple heirarchical filesystem, but rather an improved next-gen filesystem which I will be describing throughout this article.
Problem 1: Gathering the data. As I’ve already mentioned, the data is a mix of images, Word files, PDFs and saved web pages. The problem is that all this data is saved across well over a dozen folders, many with subfolders going down two or three levels. Also, most of the data has been collected over the past few weeks, but some of it goes backs almost a year. It would take my mother hours to go through her jumble of files to find the things she needs. Some of the files are badly named, meaning that searches based on filename would miss the mark quite often.
The solution: Metadata. This is data about data. The commonest example would be the tags that many web services now use to help users classify their content, without restricting them to classic file/folder hierarchies. My mother’s filesystem uses metadata extensively. Whenever a file is created, the filesystem records certain information about the file. There are mundane things like filename, size, type, etc. These things describe what the file is. But there are other pieces of information (let’s call them tags) that describe what the file is about. For example, a certain PDF file has tags that say that it is about lizards, temperature, humidity and diet. Some of these tags were entered by my mother when she saved the file. But some of them were obtained from the file itself. The filesystem took the liberty of searching the file’s contents and adding repeated words and phrases, as well as words in titles, headers, and image captions to it’s list of tags.
But having the filesystem store metadata is only half the solution. The other half is that my mother is actually encouraged to find her files based on metadata. Looking through her files based on metadata isn’t delegated to a small entry in the start menu, she doesn’t need a third party program to use the feature and she doesn’t have to open a terminal and enter arcane commands like ‘grep’ . Whenever she opens the file manager, she sees the typical directory structure, but a part of the interface is devoted to showing her common metadata on her hard drive, metadata that she recently looked for as well as prompting her to enter words or phrases related to content that she is looking for. All she has to do is type in “Maximum temperature of lizards” and she gets a list of PDFs referencing the required work, image files containing graphs of temperature tolerances etc.
Problem 2: My mother has been using the metadata capabilities of her filesystem to find the information that she needs. The problem is, she’s afraid that she’ll forget where all the numerous files are and will have to search for them all over again when she needs them again. She could copy them to a temporary folder, but if she makes any changes to them, she’ll have to remember to sync them with the original. If she moves them, she’ll have to remember to put them back when she’s done.
Solution: This solution has been around for a while, known as shortcuts in Windows and Symlinks in Linux. But this new filesystem gives it a few improvements. My mother creates a special type of “temporary folder” . Anything that she places in this folder is actually just a link to the original somewhere else. But any changes that she makes here are mirrored with the original or vice-versa, however she can turn this off in case she doesn’t want changes to be mirrored. The links are also smart. If the original file to which the link refers is renamed or moved, the link is automatically updated. Besides linking to ordinary files, she can also save lists of files which match certain metadata, similar to saved search folders.
Problem 3: Now that my mother has found all her data and got it in a temporary folder, it’s time to start writing her paper. This is going to be a big paper, over a dozen pages, she’s going to be working in many sittings and doing numerous edits. At times she’ll probably want to roll back changes, or just look at what she had written earlier. But she doesn’t want to have a dozen different files each with a different draft of her paper.
Solution: Version Control. What she needs is a robust versioning system. Seasoned programmers will testify as to how useful these systems are to maintaining code. But my mother does not have the time or the patience to set up and learn to use a version control system. Luckily for her, version control is built right into the filesystem and file manager. First off, her filesystem uses journalling so that data losses due to sudden shutdowns or reboots are kept to a minimal. She doesn’t need to keep multiple drafts of her paper stored separately, she has just one file. When she opens it, she gets the latest version.
But the filesystem keeps track of what has changed every few minutes, or every time she makes a change. By right clicking on the file in the file manager, she gets a menu with a list of every time a change was recorded and she can easily open up a previous version of a document. More advanced options allow her to choose how far back the versioning goes, how often changes are recorded and whether copying/moving the file causes the whole version history or just the latest version to be moved. Finally, if she has some sort of online or external backup, then everything is synced automatically at periodic intervals.
Problem 4: The paper in its final form is written and then emailed off to a journal editor for publication. He likes the paper, but there is a small problem: my mother had quoted from an earlier article but had forgotten to add it to the list of references. The editor asks her to make the correction and send him the corrected version. She’s glad to do so, except she can’t quite remember which article she quoted from and she doesn’t fancy the prospect of having to look through all the possible articles.
Solution: Data Relationships. Unlike most modern filesystems which only remember things about individual files, this filesystem also remembers relations between different files. This is a slightly more advanced feature and not one my mother regularly uses. But being the avid geek that I am I help her out. After opening a few menus I get to the required dialog box which tells me that she saved the file 27 times over the space of a week, that she rolled back changes thrice and she had copied the file 5 times. It also tells me that she had copied information into the file 8 times and copied something out of it 6 times. Another few clicks give me the files from which data was copied into the file as well as the type of data that was copied. By only looking at the times text was copied in from a PDF, I help my mother narrow down the possibilities to just 3 sources and it takes her about a minute to find the required reference.
Conclusion: A collection of a number of new, “smart” features means that my mother no longer has to worry about the details of how and where her data is stored or how to properly manage it. The filesystem helps her concentrate on what she wants to do with her data, not how to go about doing. For the average user, it means that the computer is more similar to how we are used to dealing with non-computerized data and that the user needn’t be bothered with obtaining and learning to use various third party devices. For power users, there is simply a lot more power in their hands. Of course not everything described above can be handled by the filesystem alone, but it is at the core of it. Next time we look at the current state of affairs: how far away we are from my mother’s dream filesystem and what other tools beside a augmented filesystem we will be needing.