Sunday Selection 2009-06-28

It’s been an eventful week with Michael Jackson and Farrah Fawcett passing away and all news of the Iran election situation being knocked off Twitter as a result. But life must go on and it’s sunday again.

Reading

Music Mind and Meaning This article from one of the legends of AI, Marvin Minsky takes a look at music and how it could affect the way we think about the mind (and vice versa). No directly about AI, but certainly worth reading if you have an hour’s time on your hands

Media

Merlin Mann on time and attention Too much to do and too little time? Can’t get yourself to focus on the important things in life? If that’s the case, then this video doesn’t have all the answers, but it can point you in the right direction

Software

Neuroarena I’m not much of a gamer, but that doesn’t mean that I can’t appreciate them (or the cool technology behind them). This is a fast paced action strategy games that should give you strategy fans something interesting to do on hot lazy weekend afternoons. What’s more, the game backends are written in a combination of Common Lisp and Erlang. Not bad at all.

Too many formats

For most of last week, I’ve been evaluating various options for starting a public, personal wiki. I’ve looked at a number of solutions, both large-scale by wiki providers and homegrown using open source tools. I still haven’t made my decision and to tell the truth I am getting a little frustrated at this point. However, if there’s anything that I learned, it’s that there are way too many formats out there.

I’m not talking about large-scale industry standard formats like HD-DVD vs Blu-Ray or something on the level of paper vs digital. I’m talking about much simpler things like the variety of publishing formats and ad-hoc text formats floating around in cyberspace. The lingua franca of the internet is still HTML, at least if you want a simple website as opposed to something running on a content management system. However, if you’ve done any sort of web development, then you’ll know that building websites from scratch using HTML is not fun. It’s XML after all, and no one should have to write XML by hand.

Even without straight HTML, there’s still a ton of formats to choose form. If you want others to read something you’ve written, what do you choose? You could use a word processor format like .doc or the newer OOXML (.docx) or if you’re more of an open source fan, OpenDocument might be your thing. But it’s a bit harder to spread an office file like that. You can’t just drop it onto a webpage unless you convert it to HTML first. You could email to people, but that doesn’t scale and most people might not care enough to actually open it. Same arguments go for PDFs.

If you really  want a lot of people to read what you’ve written, you want to provide in a form that’s accessible via a plain browser, hopefully without plugins. You could use something like Scribd‘s iPaper, but I think that’s more useful if you have a complicated PDF that you want to show without people having to actually download and open the PDF. It’s a bit overkill for normal text. But we alread decided that writing HTML is bad and so like the hackers we are, we’re going to find a way around it.

You could go the CMS-route and use something like WordPress (for blogs) or MediaWiki (for wikis) or Drupal (for something more versatile). These have the advantage of having a pretty WYSIWYG interface and a whole list of administrative features. But they require you to have your own server or web space or find a free host like WordPress.com. But if you’re someone like me, you want a simpler solution that’s more under your direct control and that you can add on to later as your needs increase. The good news and bad news is that there are a bunch of simpler, human-readable (and writeable) markup formats that can be translated to HTML with fairly good results.

The good news is that these formats are simple and the HTML conversion tools are mostly open source and of good quality. The bad news is that there are a multitude of such markups, all of which are mutually incompatible. The ones that seem most popular seem to be Markdown, Textile and reStructured Text. All of them have their strengths and weaknesses and I don’t really like any of them. If you can pick one and use it right, you can have a good experience. But you can just as easily push against the boundaries of what they offer and end up being frustrated by their limitations.

The above are all markups that are meant to be translated to HTML at some point, but can be read by people directly. Though they do allow some form of structuring (in the form of HTML headings) and allow inline HTML too, they’re not really all that good for highly structured text, like when you’re trying to make outlines for scholarly papers. There are other text-based tools to do that and my personal favorite is Org-mode for Emacs. It’s a package for Emacs that turns it into a powerful outlining, note-taking and organization tool. It lets you create an outline as a series of headings and subheadings (nested many levels deep) along with plain text and normal lists. The different levels can then be hidden arbitrarily letting you take a bird’s eye view or just focus on one part. Many people use it for GTD or some other productivity system. I prefer Google Docs and Tasks for that, but Org-mode is a note taking tool unparalleled in it’s simplicity and ease of use.

Org-mode uses a simple, custom text format to actually store any notes you make. It’s also human readable, so you can easily copy/paste it into an email and share with others. But without actually using Org-mode, it’s hard to exploit the format to it’s full potential. It becomes even less ideal when it’s exported to some other format like HTML. The org-mode concept of headers don’t neatly map onto HTML headings. Org-mode encourages header nesting which looks terrible in HTML unless you carefully lay out a CSS stylesheet for it. You can use the headers more judiciously preferring to use plain lists, but that defeats the purpose of Org-mode to some extent. It seems to me that you can’t have the bost of best worlds.

At this point any self-respecting hacker will point out that I could just stop whining and start writing converters from one format to another. After all, they all convert to HTML, it can’t be that hard to convert between them, right? It’s probably not and the Pandoc system does it to some extent, but the problem isn’t with the formats themselves as much as with the tools that work on them. HTML is good for publishing normal documents, but if you have something with many levels of nesting such as an Org-mode document, you really need something that doesn’t show all the information at once. Unfortunately there isn’t an easy way to do this without resorting to JavaScript or something similar. HTML in it’s raw form is still a very static format, presentation wise and it doesn’t always scale well to complex sets of information.

I’m going to take a break for a moment and think about what an ideal system would be like. It would be based on a simple text format that was explicit about what things meant, so you didn’t need a reference. One of my complaints about Markdown is that anything indenting 4 spaces gets treated as a block of unformatted HTML placed withing <pre><code> tags. There’s no way you would guess this by looking at the plain text and it also makes list nesting and using indentation to present your text very awkward. But I digress. While you could write this text in a plain editor, the preferred way would be in an editor that supported folding and searching and all the other editing niceties. The editor would actually double as the presentation so that there would be any export or rendering step. And it will probably be on the web. The editor will be in a browser and the actual data will be on a remote server. However, unlike many web services today, export and import to a local form will be a core strength. This way you can pull the bare text out of the editor and send to others who don’t use the same tools as you without having to go through some silly registration/sign-up step.

I started this post thinking it would be about the markup, but now at the end its turned out to be about the tools as well. HTML is a great medium because it’s so easy to render and produce. However, our data needs are going beyond what a simple document-oriented format can easily supported. I don’t think Org-mode or the like will ever become the de facto standard. But the fact that a lot of very smart people choose it over mode ‘modern’ Web 2.0 stuff is a testament to the power of simple, easily editable formats. I hope we could standardize around a dual markup system that had a simple human readable form for quick writing and a more snazzy display form. I doubt that’s going to happen any time soon, so till then I’m looking for the perfect set of text tools to store my data and ultimately show it to the world. All this ties in directly to my efforts to having a wiki that’s easy to create, easy to back up and good to look at. More on that later.

Sunday Selection 2009-06-21

It’s a warm Sunday afternoon in Southern Virginia and I need to do laundry and other such mundane stuff. But before that it’s time for another installment of Sunday Selection

Reading

The Benefits of a Classical Education Nothing to do with computers but certainly worth reading for anyone who does work that involves the intellect in any way (and I do hope that your work with the computer involves your intellect)

Media

CouchDB and me Once again not entirely about computers. It’s more about what to do with your life if you really love computer technology and want to work on your terms rather than slave away on what someone else thinks is important.

Software

Opera 10 beta This includes the much hyped and talked about Unite web server. As I’ve said before, I have my doubts as to whether or not it will have any noticeable impact on the web, but you might think differently. If you might some cool, interesting way to use it, do let me know.

Opera Unite won’t really change the web

Today Opera announced the release of their new ‘Unite’ product. The basic concept behind Unite is something that has been around ever since the beginning of the internet: users aren’t just consumers, but producers as well. Unite will turn your browser into a mini server allowing you to connect to other people and share things directly from your computer. It sounds like a good idea, but the implementation is not something that I find very comfortable with.

The idea

Don’t get me wrong, I think the idea is a great one. Being able to share your own material without having to depend on a third party and risk them stealing your stuff (or just locking it up) is a great boon. It would be wonderful if we all just had our own private servers, keyed to our personal identities in some uniquely identifiable way and exert total control over what we put online. However, the truth is that the implementation details of doing something like are very complicated,

For example, if we all started directly publishing our own content, we’d all need massive bandwidth connections and have to pay for them. We’d need to install hardware and software and keep it all up-to-date. We’d need to deal with all the potential security issues related to allowing other people to access our computers. It would also be difficult to maintain any sense of uniformity across the web. Sure, we could agree to some common protocol, but that protocol would have to be set in stone because it’s going to be very hard to get millions of people around the world to all update to a new protocol. The idea is a very good and powerful one, but it’s useless without proper implementation.

The implementation

That being said, I think Opera has done a lot to alleviate some of these problems. In particular, Unite is easy enough for just about anyone to use. They’ve taken a large part of the maintenance headaches out of the equation, at least for the software component. They also seem to have found a way around the issue of keeping everyone on the same page and playing by the same rules: producers use Opera’s custom system, but consumers can use a plain web browser. But while this strategy means that it’s easy for users to start becoming producers, it also means that people will be locked into using Opera’s product and account system. It’s this part of the bargain that I find somewhat uncomfortable with.

Unite requires an Opera account

Unite requires an Opera account

It seems to me that Opera may have solved one problem by replacing it with another one. It’s now easy for anyone to distribute their content from their own computers, as long as they buy into Opera’s system. I don’t use buy in the monetary sense of the term, but in the ‘free as in freedom’ sense. Opera claims that unite will allow “sharing data and services without the need for any third-party Web sites/applications to be involved at all”. Problem is, Opera is the third party. Sure my content is still physically on my own computer, but Opera is the gatekeeper. I feel that’s even less of a deal than uploading my data to Facebook or YouTube. Not only do I now have to pay for all the bandwidth and space I use, I also have to play on Opera’s terms. I don’t see much of a bargain in that. Perhaps I would if I was really more concerned about people ‘stealing my content’, but I honestly think that you shouldn’t put stuff on the Internet if you don’t want people to share it and spread it around.

Unite isn’t for me, is it for you?

Opera Unite is really quite an interesting piece of technology. It’s one of those ideas that no one really thinks of, but once you hear about it either seems ridiculous or very obvious. It’s a great idea to let users directly share their own content, but I’m confused as to who Opera is targeting here. Let’s start with the fact that Opera’s market share is really quite tiny. Using Unite means that people have to go and download yet another browser. Secondly, how many people will really want to pay for the bandwidth prices that they need to in order to really share their own media? Third, even if you do start using it, you’ll need to have your computer on all the time and connected, something that’s not an option for people on the move with laptops or netbooks. Finally, the market of people who will actually use this seems rather small to me. If you’re really interested in becoming an internet content producer, you’re going to want your domain name, be always on and outsource the technical details to people with more reliable services. If you’re the average internet user who just wants to share your photos with your friends, chances are you’re already on Facebook or MySpace and it works good enough. And if you’re savvy enough to be worried about people stealing your content, you know your way around the web and probably have your own server in the basement already.

I feel that Unite is one of those things that unfortunately just missed the proper timing window. Had Opera released this before social networks and YouTube made media sharing easy, they might have had a fighting chance to make something out of it. But with Facebook and the likes deeply entrenched and sharing tools like Google Wave promising a more open model for those who care, Opera seems to be outmatched and outgunned.

Feel free to use the comments to disagree with me.