Sunday Selection 2009-05-03

After a few weeks off for various reasons, Sunday Selection is back. Here goes:


Needle in a haystack: efficient storage of billions of photos This is a very informative piece from Facebook’s engineers about how they put together the infrstructure needed to support Facebook’s massive photo storage system. It’s pretty long and gets quite technical, but is definitely worth reading if you’re interested in storage.

From BFS to ZFS: past, present and future of filesystems While we’re on the topic of filesystems, this is an article from about a year ago that gives a very comprehensive (and not too technical) overview of the history of filesystems write from the early days. Also included is a glimpse into what might become the filesystems of the future.


Interview with John McCarthy, the father of Lisp Lisp is one of the most powerful programming languages in existence and I have a strong interest in it. This interview with the creator of the language takes a look at the inspiration and history behind the creation of Lisp


Adobe AIR this new technology from Adobe is designed to bring the web out of the broswer and on to the desktop. It’s designed to help developers write Rich Internet Applications which combine the information available on the Internet with all the benefits of a desktop application. I haven’t really used it very much myself and I’m not a big fan of proprietary formats, but it is still an interesting piece of technology.

Wolfram Alpha Web Seminar report

I just finished participating in a Web seminar by Stephen Wolfram about his upcoming Wolfram|Alpha search engine. Wolfram Alpha has shaken me to my intellectual core like almost nothing else I have ever seen before. Alpha isn’t really a search engine in the commonly understood sense of the word. The website calls it a “computational knowledge engine” and that is a very accurate, if not very intuitive description of what it is. You really have to see it in action to see what it does. Its defining quality is that instead of just looking across the internet for information relevant to a query, it actually attempts to compute the answers. I’ll give some examples later on to show what I mean, but for now I’ll start by walking through the seminar while it’s still fresh in my head.

The very first thing that Wolfram said (and reiterated a number of times) was that Alpha was based on Mathematica and his New Kind of Science. While this probably isn’t surprising, it’s certainly not something that most software companies or even computer science researchers would actively think about doing. However, both of them are very powerful tools and Alpha is a testament to that. Later on I had the chance to ask if cellular automata (the foundation of NKS) were a core part of Alpha. Wolfram was very emphatic that they were. He went on to say that he had always wondered what the first killer app for CA would be and that Alpha was it, even if it was a ‘prosaic’ application of NKS. He also said that NKS methodology was used heavily in the construction of Alpha. He hoped that Alpha would help people to actively create new types of science and scientific models by exploiting computational analysis.

Wolfram showed a few examples of how Alpha could be used. He used it look up data of Springfield, MA and showed how Alpha was capable of understanding queries and computing and intelligently displaying relevant data. For example searching for a chemical compound showed it’s structure and information about it’s physical and chemical properties as well as how to create it. Given a specific amount of a compound (4 molar sulphuric acid in this case) Alpha gave precise amounts of the other chemicals needed to create the given amount. Another interesting sample was when he typed in a DNA sequence and Alpha showed a possible human gene that matched it as well the relevant amino acids it encoded. That example almost blew me away.

Alpha has 4 major components:

  1. Data curation: Alpha doesn’t feed off the entire web but rather works off a managed database and certain trustworthy sources (Alexa and US Census info being among them). Data which does not change is managed and categorized whereas the sources are polled regularly for relevant, up-to-date information.
  2. Computation: 5-6 million lines of Mathematica spread across lots of parallel processors (10,000 in the production version) make up the heart of Alpha. They collectively encode a large segment of the algorithms and computer models known to man. They can be applied to theoretical problems (ie, integration, series creation, airflow simulation) or to specific data (weather prediction, tide forecasts etc).
  3. Linguistic components: The demonstration makes it clear that their is a very powerful (though far from perfect) natural language processing system at work. This freeform linguistic analysis is essential to Alpha because without it, a manual to make proper use of Alpha would be thousands of pages long (according to Wolfram).
  4. Presentation: Alpha is very pleasing to look at. The information is shown in a way that makes it very easy to get a good grasp of what’s being displayed but isn’t overwhelming at all. Though there is a standard overall format (individual data segments are arranged into ‘pods’ on the page), the actual displayed is very tailored to the specific query. It is actually simple enough for a child to use.

Wolfram has very clear and powerful ideas about what Alpha should achieve once it goes live. His main recurring theme is that it should open up computation and data analysis to everyone. Over human history we have learned to calculate, compute and algorithmically manipulate across a very wide range of topics and data. However, gaining access to these powerful tools requires considerable training and resources. Wolfram wants Alpha to let everyone become a personal scientist (that’s close to the actual words he used) just like search engines allowed everyone to have a personal reference librarian.

Alpha focuses on questions that have definite answers or that have answers that can be computed directly. In cases where there is confusion or dispute, or Alpha cannot compute sufficient answers, there will be the option of sidebar links to additional resources (like Wikipedia). Talking about Wikipedia, Alpha won’t be open for everyone to contribute to, however Wolfram said that there would be a smooth process for experts to contribute to Alpha’s knowledge base.

Talking about Alpha’s actual deployment, Wolfram said the free version would be open to everyone and would allow some amount of customization (like defining specific fields to perform specific operations). Alpha would have a set of APIs allowing data retrieval on multiple levels. Whole pages of results, or just certain sections could be obtained, as well as the underlying data and mathematical abstractions used to obtain those results. There would also be APIs to leverage the language processing infrastructure. Commercial offerings will be available where by the knowledge base could be augmented by a company’s internal information and Alpha would then apply its computational analysis to that knowledge base. I think this is going to be very useful for companies, large and small alike.

Throughout the webinar, Wolfram showed lots of examples of what Alpha could do, some of which were just plain neat and others which were awe-inspiring. Wolfram himself seemed very interested in finding out the limits of the system and would get somewhat distracted by bugs popping up when they weren’t supposed to. I suppose that’s a good thing considering how important and successful Alpha could be. He was always very good about answering questions.

My personal thoughts about Alpha are a bit hard to describe. I think it is a wonderful piece of technology that goes a long way to making computation meaningful in people’s lives. Most people use computers like glorified typewriters and record players, but Alpha might just change that. From a computer scientist’s point of view, it is a certainly a very interesting application of computer technology. The natural language processing that’s available seems considerably more capable than what is seen in most search technologies. I hope that as Alpha launches more details of its implementation come to light. As an engineer, I’d love to know more ab0ut how Alpha’s computational and data-management systems are structured and how the massive parallelism is handled. Most importantly, I hope Alpha causes a fundamental change in how computers are used and what people’s expectations of software are. Make no mistake it, Alpha is important. I won’t say it’s the best thing since sliced bread, but it could be. A lot depends on how people actually use Alpha and how open Wolfram makes it. If there is enough data made available (or if there is any easy way for people to supply their own data), I can see it becoming a powerful tool for real scientific endeavor. Here’s wishing Alpha and Wolfram the best of luck for the future.

NCUR 2009 Day 2

Due to a number of unforeseen circumstances (mostly involving lack of a wireless connection) I haven’t been abel to live blog the conference as I had intended to. Here’s a rather delayed roundup of the second day of the conference.

I didn’t actually make it to the conference until lunch. There was one session of presentations and posters in the morning, but I didn’t see anything of interest.  I went to on poster session in the afternoon and I saw a number of interesting posters. There as one study of using Lego Mindstorms to encourage children to study math and the researchers had found it quite effective. I’m currently looking into building a Mindstorms-style interface for my own research project, but I wasn’t quite sure if it was the right thing to use. However, I think that it’s going to be a good bet.

The most interesting poster (and presenter) was Thomas Levine’s poster about how people position their keyboards. The work he had done was interesting, but it was the conversation I had with him that I found more useful. We talked a lot about different types of keyboards, seating positions, keyboard layouts and such things. Since I’m interested in keyboards too, I think I might stay in contact with him in the future.

There were other cool posters, one about how the perception of death changes people’s attitudes towards loyalty, fairness, duty and other such moral qualities. Though I’m not much interested in psychology, I think it’s worth knowing what things my thought processes respond to and how. Coming back to technology there was interesting work on document processing and structuring the extracted information into machine-processable data trees. Natural language processing isn’t one of my areas of interest but it’s an important field with lots of open questions and I’m glad to see thtat there are smart people working on it.

After the poster session it was time for me to make my own presentation. I talked about how we had applied formal grammars to studying how complex systems evolved over time and how they could be controlled. I had to rush my presentation towards the end since I spent a bit too much time on introductory material. But it went of well and judging by the questions I received, it seems like there was a good amount of interest. It helped that the people in the room were very tech-savvy. The presenters following me almost blew me away by the work they were doing. The next presenter, Abdulmajed Dakkak showed how he had used a variety of tools and languages to create a powefulway to geographically track Bittorrent usage by looking at the IPs of people connected to a swarm. He also had a great looking presentation in Flash instead of Powerpoint which I thought was really attractive. What amazed me even more was that he had done all his work off a netbook. I’ve been looking to get my feet wet in networks and parallel programming and I might consider duplicating some of his work, but using my college’s clusters instead of a netbook.

The last presentation was about a tool for authors called Story Signs developed by John Murray. This is an interesting tool that is designed to help authors better structure their stories by adding tag-like information to different parts of a piece of text like what characters are involved, what the scene is about, what sort of a scene and so on. Being interested in writing myself, I thought this was a really interesting tool. I would be interested in seeing the different ways in which this tag data could be used. Some sort of social networking built on top of it might eb interesting (think of it in terms of a literature version of Flickr). Furthermore the user interface was built in Flash and was really good-looking, very different a run-of-the-mill desktop app. I think it might be a good idea to look into Flash for implementing desktop UIs in the near future.  It’s not something I had considered before, but it might be worth thinking about.

That was the last session for the day. I spent the evening at a social event that had been organized for the attendees which actually turned to be a lot of fun. There was one more session on saturday morning, but since we were leaving that morning, I decided to just sleep in. The trip back was uneventful. I tried to get onto wifi at Minneapolis airport but I couldn’t, if I had this post would have been a day ago. I have a lot of school work to catch up with at this moment, but once I’ve caught up a bit, I’ll post about the lessons I learned at NCUR (and which I’ll really try to follow at my next conference).

NCUR 2009 Day 1 afternoon

The afternoon held an interesting group of sessions. It turns out that the organizers tried to group together similar sessions into groups. However, multiple sessions were running in parallel meaning that there were multiple interesting things happening at the same time. In my case, there was a robotics group and a computer networks group running at the same time. I made the choice to go to the robotics session with a vague idea of leaving after the first two to run half way across the La Crosse campus to get to the networks session. I actually did do that, but the second robotics presentation ended early letting me walk at a brisk pace to the Computer Science building.

The robotics presentations I attended were both by West Point students who were developing robotic equipment for use in the battlefield. The first session was about an unnamed, remote controlled vehicle which would be at the head of a convoy and was also equipped with sensors which would detect obstacles and possible explosive devices in their path. The on-board systems would automatically stop the vehicle waiting for an override from the operator. The system is really interesting and I was impressed by the combination of electrical engineering and computer science that had to come together to make this happen. The fac that they wrote their own device drivers for some of their electronics makes this even more awesome. The second presentation was about smaller autonomous robots which could find their way through mapped rooms. A user would only need to select a point on a map and and the robot would find its own way to the destination, avoiding obstacles by moving around them. The concept was interesting, but the presentation could have been better.

After that a quick walk took me to the Wing Technology building where the computer science presentations were taking place. I had missed the first two, but was in time for the next two. The first was about a simple music-production system called COMPOSE which allows users to select notes and beat patterns by clicking on icons and buttons. The user interface is simple enough for kids and non-musicians to use. The software is simple and would be fun to use. Eventually it is supposed to be part of AI research to understand how humans identify pleasing sounds and note sequences. The next presentation was about using neural networks to predict battery status in solar-powered vehicles. Once again, this is at the combination of computer science and electrical engineering and so very interesting to me. 

I decided to stick around for the next set of conferences. The next 4 presentations were on a variety of subjects. The first was about automating genome research by building a set of extensible, online tools; the second showed a way to classify medical images from different sources (MRI, CAT, X-ray etc.) by analyzing the images themselves. The third one was about using simple image color analysis to look for anomalies. It’s meant to be used for rescue agencies to quickly analyze aerial video footage to look for lost people. The final presentation was about a digitization project to make available all the data collected by the Freedmen’s Bureau. This Bureau was created by Lincoln to help the slaves who were being freed and contains a very large amount of information about the people and their lives. I think digitization projects are important because they help preserve a large amount of history and make it available for other interesting research projects.

That was the end of the presentations I could attend. I went on a river cruise which I spent by mostly talking with one of my professors. I’m now back in my room, watching old House episodes. I have to touch up my presentation a little for tomorrow. I don’t have any plans for tomorrow and I’ll just make up my plans as I go along. I’ll try to keep posting as often as I can.

NCUR 2009 Day 1 morning

NCUR 2009 kicked off today and I’ve already had a good time. I didn’t get up early enough to be at the start of things, but I did wake up in time to go get breakfast. After some banana bread and coffee I headed to the hall where all the posters were on display. My main intention there was to give my friend some support. His poster was on building solar power into small vehicles like golf carts and he tells me he got a fair amount on attention. I looked around the rest, but I didn’t really find anything that was of particular interest to me. However I did spend some time talking to a girl who had done work on exploring the possible application of Hamiltonians Systems that obeyed conservation laws and had symmetry. Her preliminary research was interesting but what I thought was more interesting was that she planned on applying her ideas to design of airplanes and jet engines and I hope that her work yields some interesting new technologies.

The main attraction of the day was paleontologist Jack Horner who might just be the most famous dinosaur hunter in the world. His talk was really interesting and he talked about everything from how he got interested in dinosaurs and never really got a formal college education. He also talked about dinosaurs in general, Jurassic Park and new genetic technologies that are being used to explore the relationship between dinosaurs and birds. What I really liked is that he stressed that scientists should communicate their work and make it accessible to the general public. I very much believe in this and I try to do this in my own small way by writing this blog. There was a QA session at the end of it, which also brought up some interesting points, including T-rex vs spinosaurus, dino-chickens, velociraptor intelligence and a very short detour into creationism (to which Horner’s answer was “I’m glad my God doesn’t trick me”).  I hope this was being recorded and gets put on YouTube soon. 

I just finished lunch and I’m currently waiting for the robotics session to get started. I’ll have another update later.