My summer research job involves a fair amount of reading. It also involves searching for papers on databases such as the ACM’s Digital Library. The only problem is that once you get to more than a few papers, it becomes really hard to figure out which paper is which. The papers often have long titles, so you really don’t want the filenames to be the paper titles. There’s also a lot of metadata associated with each paper (title, authors, where and when it was published and so on). It quickly became apparent to me that I really needed a better way to organize all the papers I was downloading and reading. The problem became even worse when I decided to start making detailed outlines and notes for the more important papers, because now I had to have someway to connect each paper to its notes.
I looked at a number of really good tools out there before making a choice. The very first program I remember looking at was Papers for the Mac. Papers is a really slick application. Not only can you collect and organize your papers, you can also read them right in Papers, take notes and send a copy to fellow researchers. It also lets you search popular databases and download papers without needing to step out into a browser. If I was a full time Mac user, then I would almost certainly be using Papers. But since most of my more scholarly is done on my Linux laptop, it’s not really an option. It’s priced at a very reasonable $42 with a 40% discount for undergraduate students. Not a bad deal at all.

The Papers interface (from the website)
I happen to do most of my scholarly work on my linux laptop and so I needed something that worked well on Linux. There is a program called gPapers for Linux which is similar to Papers. I didn’t actually check this out myself. From the screenshots it seems like a good tool, but development on it seems to have stopped for a while and I wasn’t sure if it was complete enough for daily use. My next choice was the Firefox plugin Zotero. Since it lives right in your browser, it makes it very easy to collect papers that you read on the Internet. If you get a Zotero account you can even sync your papers and notes between multiple machines. Once again, Zotero is a great tool. Not only can you add any web page to it’s library, you can also attach notes to any item in your library. The most awesome feature is that for PDFs it will automatically retrieve bibliographic information from Google Scholar. You can also export bibliographic information in a variety of formats. Zotero makes a lot of things very easy. To be honest, I haven’t entirely ruled it out yet. Perhaps the only real reason that I’m not using it right now is that the interface seems a bit cramped on my not-too-big laptop screen.
After trying all these out I took a look at one online reference management tool: CiteUlike. I was very interested in this as it would make sense for whenever I worked from library computers. It’s a really good resource and works very well with popular databases in terms of automatically gathering metadata. The only problem I have is that it doesn’t work if I point it directly to the PDF of the article. There is an option to store the PDF online, but this requires an upload from the local computer and not from an URL. I can understand why this is needed because many of the journals require subscriptions, but it is a bit of hassle. I’m not using it now because I really don’t need an online service at the moment. I like keeping all my papers organized locally. However, if I find myself moving about a lot once school starts again, I might seriously consider using CiteUlike more regularly. There are similar online services such as Connotea, but I haven’t tried them out myself.
So what do I use? There were a number of factors that affected my choice. Firstly, I wanted to have copies of the PDFs on my disk. I needed to know where they were and they had to be named according to some common scheme so that if I had to switch tools I could do so easily. Secondly, I had to be able to easily extract reference information as a BibTex file. I use Latex to write my own papers, so there was no compromising on that. Thirdly, there had to be a good way to edit and view notes for each PDF. Zotero was certainly the tool that came closest to meeting my needs (except perhaps for Papers). In fact, in the process of writing this review I’ve been sorely tempted to actually use it full time. And I would too, if weren’t for one almost unrelated piece of software: Org-mode for Emacs.
Org-mode is a package for Emacs that turns it into a powerful note-taking and organizational tool. It offers some really good features including very easy-to-use (and smart) show/hide allowing you to concentrate on parts of your notes. It also allows to attach tags to parts of your org files and then search and re-organize based on them. You can insert links to URLs or other files (which can be opened in Emacs or in other programs). Over the past few days I invested a few hours to make a homegrown solution based around Org-mode and Emacs.
I combine some low-tech organization with some simple scripting to make my system work. Here’s how: each project for which I need to do research has a papers and a notes directory. The name of any file that goes into either directory is of the format <publication><year>-<first author> with the appropriate extension. I also have a BibTex file that contains bibliographic information. The key for each entry is in the same format as the filename. I then have a quick Python script that matches each PDF with it’s notes file and bibliographic data and combines it into an easy to read org file with appropriate links. Since I have Emacs open all the time, whenever I need to look up a paper all I need to do is open up the org file and I get organized links to both the paper and its notes. I also attach tags to the papers’ names to make locating them easier. Here’s a screenshot of what an example org file looks like after I’ve added some tags:
I’m currently using this system on a project with about 15 papers and growing. It works well so far and I’m really comfortable in it. The only part of it that I don’t like is having to manually get the BibTex data. Since gathering metadata is a feature that Zotero has, I’m considering using Zotero as the ‘front-end’ of my system i.e. I use to Zotero to download and store the PDFs and their data, but I then use my Emacs-based system as the actual interface. This is something I still have to explore, but I think it could work. I should also note that I use Delicious to quickly bookmark interesting articles that pop up in searches, before going back and doing a preliminary quick scan prior to actually downloading the PDF. As usual I would love to hear any comments or suggestions you might have.


