I was at a lecture the other day and I heard the speaker say ‘quote … something something … end quote’. I don’t like it when people say ‘quote … end quote’ to show that they are quoting another source. I think it’s a very ugly way of talking. It seems that if you need to actually say ‘quote … unquote’ to let your listener when a quotation starts or stops, then you might as well start saying your commas and full stops out loud. Human to human communication is very interesting, especially when it is live. When two or more people speak there’s a lot more going on than simple interchange of meaningful words. The speakers tone of voice, inflection, the presence and duration of pauses, all combine to give a lot more information than the words on their own could. Good speakers use all these properties of verbal communication to make their words more effective and meaningful. I think that a proper use of vocal tone, rhythm and pausing can make it clear when a quote starts and when it ends. Unfortunately, it seems that most people don’t really think about this. There seems to be a thought that the presence of quotation marks in text means that there needs to be an equivalent expression when speaking. But this idea stems in part from an important fact being forgotten: spoken and written language are not the same thing.
Take a look at any piece of text. This post that you are now reading will do perfectly. When a language is written down, there is a lot more information than goes down than is actually spoken. Punctuation marks are the prime example. They’re special symbols that represent the elements of natural speech that don’t correspond to actual words. Periods and commas represent pauses. Apostrophes represent the omissions that are made as two words are merged to make speech easier or faster. They’re efforts to better approximate the non-factual parts of speech. But of course they don’t go the whole. Rhythm, speed and tone are hard to put in written form, at least for a language like English. Humans have been communicating with other humans for thousands of years now and we’re still not perfect at it.
This imperfection isn’t just limited to natural speaking or riding though. Computerized communication and information have their own set of problems. As an example, take the history of markup languages. Wikipedia defines a markup language as “a set of codes that give instructions regarding the structure of a text or how it is to be displayed”. Plain text has always been a sort of ‘natural’ medium for computers to information data. However simply recording the contents of the text is often not good enough. You want some information to tell different parts of the text apart. You can use this information to control how the text is displayed, or whether it has some special meaning to the computer. The TeX typesetting uses a markup language to give very precise instructions to a program (called tex) on how different parts of a document should be arranged when printed. The printing can be to paper as originally intended or some electronic format like PDF.
While Tex is focused on presentation, SGML and it’s popular derivatives HTML and XML target a different problem: what does the text actually mean ? How it looks is handled by stylesheets (CSS for HTML, XSLT for XML), which match the semantic components of the text to corresponding presentation style. This means that the meaning of the text can be kept separate from presentation and that multiple styles can be applied to the same content. And lest anyone tell you otherwise, presentation is important. Really important. The first iterations of HTML lacked any clearly thought-out presentation rules, leading to a proliferation of custom HTML pseudo-tags and the proliferation of table based layouts. CSS helped the situation at the cost of layering on yet another language with it’s own set of standardization requirements. XML is becoming very popular as popular data interchange format, however it is not without it’s share of flaws. Firstly, it’s very verbose. Writing any form of XML by hand is not an enjoyable task. It’s hard to throw together a quick regular expression to make sense of a large XML structure, but this is mitigated in part by the presence of some really good XML manipulation tools.
So much for the acronyms. Where were we? Information interchange. We still haven’t figured out the best way to do it, or even agreed on a good way to do it. Inert static data is one thing, but changing active data is quite another. Computer programs are in essence information. In the Lisp programming language there is no need for any differentiation between code and data. Code can be represented very easily as tree-like data structures and manipulated just as easily. This gives Lisp a level of expressive power that is orders of magnitude above any other programming language. Going the other way, you can just as easily transform your data into code. If you can find a way to store your information as Lisp-style S-expressions, then you can turn them into self-manipulating programs later. Of course, no one said it would be easy. Steve Yegge has a slightly dated but very appropriate article on just this topic.
So how does all this affect you and me? Honesly, I don’t really know. I know that many of the tools we have (both in terms of natural and computer language) are very good, but there certainly isn’t one size that fits all. You really shouldn’t be saying “quote … endquote” when you can change the way your voice sounds to signify the same thing. You shouldn’t be using XML when a simple regexp parseable solution. But you shouldn’t avoid XML if the alternative is writing your own deterministic finite automata that is soon going to rival an enterprise-strength XML parser in complexity. The problem, as they say, is choice. Make yours carefully.