Moving to org2blog for publishing posts

For most of the last few years I’ve been using the WordPress online editor for writing posts. Part of this was because I moved between computers a lot and wanted to be able to get at my posts and drafts from wherever I was. But since I’m now using one machine for most of my writing (and all of my blogging) I’ve been able to finally move to centralizing all my writing under Emacs. Luckily I found a great Emacs mode that makes posting to WordPress a snap. org2blog is made to be used with org-mode files but by and large you can ignore the org-mode part (if you want to).

Org-mode is a helpful plain text mode for organizing notes, todos, agendas and even writing in general. I use it for taking notes about academic papers and meetings I go to. org2blog mainly uses the plain-text org format for setting up the metadata for the post — title, date, tags etc. But org-mode also makes inserting links easy and I’m much faster writing with all my Emacs editing shortcuts than I am in a text box in a browser. Org2blog then posts the org-file as draft (or published post) with a single command. I personally just save as drafts and then look at the preview before hitting publish. By writing in org-mode on a single I can also keep local backups of all my posts. Currently each post is just saved to a ByteBaker folder as a separate plain text file but I might put it all under version control at some point.

I have been toying with the idea of moving this blog off WordPress to a more home-brewed setup, but I haven’t been able to justify the time and effort it would take. Might be a winter project to get through the upstate New York winters. Personally as long as I have a trustable backup of all my code and add new things easily I’m fairly ambivalent about how the HTML actually gets generated and presented (especially if it’s done by open source software made by people I like). For the time being I’d rather invest in writing the blog than hacking it.

Is HTML finally getting there?

I just finished my first presentation done completely in HTML5. Ever since I made the move to plain text for most of my writing a few years ago I’ve been looking for a way to make presentations without resorting to PowerPoint or Keynote. I knew that I could use PDFs but somehow just throwing up static PDFs onto a screen didn’t really seem the best for a presentation.

Recently a very well done HTML5 presentation demo made the rounds on the intertubes. On the HTML5 Rocks website they had a template for the presentation and so I downloaded and used it to roll my own. It’s not the flashiest thing in web, but it certainly holds its own against most PowerPoint presentations and is definitely better than any PDF presentation I’ve seen.

Making that presentation brought me around to the view that maybe, just maybe, HTML is getting close to becoming a usable and fairly universal documentation format. The combination of HTML5, CSS3 and faster JavaScript engines becoming commonplace has made things like HTML slideshows not only possible, but actually attractive. I think we’re at the point where we can seriously consider completely ditching proprietary binary formats (I’m looking at you, Word) and go for full-on hyperlinked documents as our main format for sharing information.

That being said, I’m not saying that writing in HTML is the best thing to do. After HTML is a flavor of XML and writing bare XML by hand is just painful. The only case where writing HTML by hand makes sense is when you want really good control of the layout (like on my static webpage). Creating the presentation in pure HTML was a good learning experience, but I definitely want to wrap it in some sort of templating system. What I really want to see is powerful tools that write to HTML and CSS for the styling and content and maybe even auto-generate custom JavaScript for animation and the like.

One of the best systems that generate HTML is the Emacs org-mode. The documentation for my last project was written entirely in lightly marked up plain text and automatically converted to HTML. The only code I really had to write was the CSS for it, which is pretty simple. It might be possible to use org-mode for generating my slides too, but it’s not something that I’ve explored in any detail. It’s certainly possible to create PDF slides (using export to LaTeX) and perhaps some variation of that will work for me.

Even though Google Docs is pretty decent tool, I feel they’re woefully under-utilizing HTML5’s true potential. In particular the slideshow app is very bare bones when it could easily be much better. Strangely enough, Google has made some really strong inroads in other areas. For example, their font API lets you use a number of really good fonts by just including a few links of code in the header of your HTML. Google Docs should really be able to plug into their font API and let us use those fonts in docs and presentations.

I really think that we can live in a world where HTML provides all our text documentation needs (and efficiently includes audio and video as needed). I’m going to be starting a little experiment where all my documentation for my honors is done in plain text and automatically exported to clean HTML. I’m also hoping that for my final written thesis I’ll be able to write in some plain-text source format (probably org-mode) and do painless exports to both good-looking HTML and LaTeX for making print PDFs. I’ll also be using HTML5 for all my presentations from now. Stay tuned over the next few months on how these experiments turn out.

Too many formats

For most of last week, I’ve been evaluating various options for starting a public, personal wiki. I’ve looked at a number of solutions, both large-scale by wiki providers and homegrown using open source tools. I still haven’t made my decision and to tell the truth I am getting a little frustrated at this point. However, if there’s anything that I learned, it’s that there are way too many formats out there.

I’m not talking about large-scale industry standard formats like HD-DVD vs Blu-Ray or something on the level of paper vs digital. I’m talking about much simpler things like the variety of publishing formats and ad-hoc text formats floating around in cyberspace. The lingua franca of the internet is still HTML, at least if you want a simple website as opposed to something running on a content management system. However, if you’ve done any sort of web development, then you’ll know that building websites from scratch using HTML is not fun. It’s XML after all, and no one should have to write XML by hand.

Even without straight HTML, there’s still a ton of formats to choose form. If you want others to read something you’ve written, what do you choose? You could use a word processor format like .doc or the newer OOXML (.docx) or if you’re more of an open source fan, OpenDocument might be your thing. But it’s a bit harder to spread an office file like that. You can’t just drop it onto a webpage unless you convert it to HTML first. You could email to people, but that doesn’t scale and most people might not care enough to actually open it. Same arguments go for PDFs.

If you really  want a lot of people to read what you’ve written, you want to provide in a form that’s accessible via a plain browser, hopefully without plugins. You could use something like Scribd‘s iPaper, but I think that’s more useful if you have a complicated PDF that you want to show without people having to actually download and open the PDF. It’s a bit overkill for normal text. But we alread decided that writing HTML is bad and so like the hackers we are, we’re going to find a way around it.

You could go the CMS-route and use something like WordPress (for blogs) or MediaWiki (for wikis) or Drupal (for something more versatile). These have the advantage of having a pretty WYSIWYG interface and a whole list of administrative features. But they require you to have your own server or web space or find a free host like WordPress.com. But if you’re someone like me, you want a simpler solution that’s more under your direct control and that you can add on to later as your needs increase. The good news and bad news is that there are a bunch of simpler, human-readable (and writeable) markup formats that can be translated to HTML with fairly good results.

The good news is that these formats are simple and the HTML conversion tools are mostly open source and of good quality. The bad news is that there are a multitude of such markups, all of which are mutually incompatible. The ones that seem most popular seem to be Markdown, Textile and reStructured Text. All of them have their strengths and weaknesses and I don’t really like any of them. If you can pick one and use it right, you can have a good experience. But you can just as easily push against the boundaries of what they offer and end up being frustrated by their limitations.

The above are all markups that are meant to be translated to HTML at some point, but can be read by people directly. Though they do allow some form of structuring (in the form of HTML headings) and allow inline HTML too, they’re not really all that good for highly structured text, like when you’re trying to make outlines for scholarly papers. There are other text-based tools to do that and my personal favorite is Org-mode for Emacs. It’s a package for Emacs that turns it into a powerful outlining, note-taking and organization tool. It lets you create an outline as a series of headings and subheadings (nested many levels deep) along with plain text and normal lists. The different levels can then be hidden arbitrarily letting you take a bird’s eye view or just focus on one part. Many people use it for GTD or some other productivity system. I prefer Google Docs and Tasks for that, but Org-mode is a note taking tool unparalleled in it’s simplicity and ease of use.

Org-mode uses a simple, custom text format to actually store any notes you make. It’s also human readable, so you can easily copy/paste it into an email and share with others. But without actually using Org-mode, it’s hard to exploit the format to it’s full potential. It becomes even less ideal when it’s exported to some other format like HTML. The org-mode concept of headers don’t neatly map onto HTML headings. Org-mode encourages header nesting which looks terrible in HTML unless you carefully lay out a CSS stylesheet for it. You can use the headers more judiciously preferring to use plain lists, but that defeats the purpose of Org-mode to some extent. It seems to me that you can’t have the bost of best worlds.

At this point any self-respecting hacker will point out that I could just stop whining and start writing converters from one format to another. After all, they all convert to HTML, it can’t be that hard to convert between them, right? It’s probably not and the Pandoc system does it to some extent, but the problem isn’t with the formats themselves as much as with the tools that work on them. HTML is good for publishing normal documents, but if you have something with many levels of nesting such as an Org-mode document, you really need something that doesn’t show all the information at once. Unfortunately there isn’t an easy way to do this without resorting to JavaScript or something similar. HTML in it’s raw form is still a very static format, presentation wise and it doesn’t always scale well to complex sets of information.

I’m going to take a break for a moment and think about what an ideal system would be like. It would be based on a simple text format that was explicit about what things meant, so you didn’t need a reference. One of my complaints about Markdown is that anything indenting 4 spaces gets treated as a block of unformatted HTML placed withing <pre><code> tags. There’s no way you would guess this by looking at the plain text and it also makes list nesting and using indentation to present your text very awkward. But I digress. While you could write this text in a plain editor, the preferred way would be in an editor that supported folding and searching and all the other editing niceties. The editor would actually double as the presentation so that there would be any export or rendering step. And it will probably be on the web. The editor will be in a browser and the actual data will be on a remote server. However, unlike many web services today, export and import to a local form will be a core strength. This way you can pull the bare text out of the editor and send to others who don’t use the same tools as you without having to go through some silly registration/sign-up step.

I started this post thinking it would be about the markup, but now at the end its turned out to be about the tools as well. HTML is a great medium because it’s so easy to render and produce. However, our data needs are going beyond what a simple document-oriented format can easily supported. I don’t think Org-mode or the like will ever become the de facto standard. But the fact that a lot of very smart people choose it over mode ‘modern’ Web 2.0 stuff is a testament to the power of simple, easily editable formats. I hope we could standardize around a dual markup system that had a simple human readable form for quick writing and a more snazzy display form. I doubt that’s going to happen any time soon, so till then I’m looking for the perfect set of text tools to store my data and ultimately show it to the world. All this ties in directly to my efforts to having a wiki that’s easy to create, easy to back up and good to look at. More on that later.