Compilers and nuclear reactors

This summer I’m doing an internship at a small, research-y software shop called GrammaTech. They do a lot of research-based projects and have a static analysis tool called CodeSonar that is really quite nifty. I’ve been poking around the CodeSonar codebase as part of my work and I’m pretty impressed by the kind of things they do. Most of it is in C and I can easily say I’ve never seen C written this way before. They leverage C’s close-to-the-metal nature and the preprocessor to create some pretty powerful abstractions. It makes me painfully aware of how much I still have to learn. While marveling at what a deep knowledge of C can do I came across “Moron why C is not Assembly” by James Iry – an explanation of why C is not just syntactic sugar for assembly language.

As I learn more about programming in general and programming languages in particular, I’m starting to think that there is something of a Law of Conservation of Power, vaguely akin to the conservation of matter and energy (but not quite as absolute). For example, Iry talks about how C enforces the stack abstraction and hides any parallelization in the hardware (or in the generated code). By moving from the assembly world to the C world you’re trading one form of power for another – you obey the constraints of the stack and function calls get much simpler. But you lose the ability to fine tune using dedicated hardware instructions.

This is true as you continue exploring more languages – give up the looseness of C’s weak type system for something stricter and stronger (ML with it’s algebraic datatypes for example) and you can have the machine enforce invariants and guarantee certain properties of your code. You can perform easier and more elegant matches and actions on the type of your data. But you give up the flexibility that comes of raw data and fundamentally unstructured bits. If you choose strict immutability and pure functions (like Haskell or Clojure) you get to interact with your datatypes in a more mathematically precise form, you get to reap the benefits of concurrency without worrying about data corruption (to some extent). But you lose the ability to quickly stash some metadata into the corner of some variable data structure and pull it out at will.

If we start viewing different languages as tradeoffs in the power available to the programmer then a compiler becomes a very special tool – a power transformer, akin to a nuclear reactor (or a Hawking’s Knot if you don’t mind some futurism). A compiler, at its core, takes a stream of symbols (according to predefined syntactical rules) and transforms them into another stream of symbols (according to predefined semantic rules). Along the way, the compiler is responsible for enforcing the power transforms – ensuring that your C code doesn’t get to the underlying stack, that your Haskell code obeys the type constraints. If our languages are tools with which we build our universes, our compilers enforce the laws of physics (whatever laws they may be). The reason we’re computer scientists and not physicists is that we can create and change these laws at whim, instead of only studying and exploring them.

Just as we don’t want to be putting nuclear reactors in SUV’s there isn’t a “best” power tradeoff. Do you need to be really close to the metal on a resource constrained embedded system? Use C. Do you need a guarantee that a successful compilation will rule out a certain class of runtime errors? Use ML. If you need a fuel-efficient vehicle to take you to work everyday, get a Prius. If you need a self sufficient, water-borne weapons platform that only needs to be refueled every few decades and can rain down vengeance on your enemies should the time come, then invest in an aircraft carrier or a nuclear submarine. Don’t bring a knife to a gunfight.

There are two big elephants in the room: Turing-completeness and Lisp. All Turing complete languages are strictly equivalent in their computational power, but that misses the point of this discussion. Most programmers are not writing code for machines at all: they are writing programs for programmers (including themselves) to read, understand and use (and only incidentally for machines to execute; thank you, SICP). When you change the rules of the game to be not strict computational power but expressiveness and understandability to another human, this Conservation of Power thing becomes much more important. Choosing the correct set of tradeoffs and balances (and hence the correct language and related toolsets) becomes one that has far reaching impact on your team and project. Make the wrong choice and the Turing tarpit will swallow you alive and surrender your fossilized remains to a future advanced insectoid race.

Lisp is another matter entirely. The so-called “programmable programming language” has been used to build everything from operating systems to type systems which are Turing complete in themselves. As Manuel Simoni puts it, in the Lisp world there is no such thing as too much power. Lisp laughs flippantly at the Law Conservation of Power by placing in the programmer’s hands the enforcer of the Law – the compiler itself. By virtue of S-expressions and macros Lisp allows and encourages you to play God. The natural way to program in Lisp is to “write code that writes code” – creating your own minilanguages tailored to the task at hand. With great power comes great responsibility of course so the Lisp programmer must be particularly careful.

I haven’t explored Lisp as much as I’d like to and I’m only just starting to look into ML and Haskell. But as a career programmer (or some approximation thereof) I think it’s a good idea to routinely move between various points on the power spectrum. That’s not to imply that it’s a perfectly linear scale, but that’s a matter for another post. As my experience at GrammaTech is showing there are delights, wonders and challenges no matter where you decide to plant your feet.

Languages abound

After almost a month and a half I’m back in a position to write The ByteBaker on a regular basis again. Instead of a lengthy explanation and reintroduction, I’m going to dive right in.

At the end of August I’m going to be starting my PhD program in Computer Science at Cornell University. Over the last few years of college I’ve developed an interest in programming languages and so I’m spending the next few years pursuing that interest (and hopefully writing some good software in the process). Programming languages are an intensely mathematical and logical area of study. In fact, I’ll admit that I am a bit intimidated by the amount of knowledge about the logical foundations of PL I’ll have to gather before being able to make a meaningful contribution. But on a personal level, it’s not really the mathematical rigor or the logical elegance of these systems that I find interesting. For me, programming languages, just like real languages, are a medium of expression.

In the end, what we’re really trying to do is express ourselves. We start by expressing the problems we want to solve. If we do it well, (and our languages are expressive enough) the expression of the problem leads to the solution. If we do it not-so-well, or if the problem is particularly complicated, we have to express the solution explicitly. In addition to problems and solutions, we can express ideas, data, relationships between data and complex interacting systems of relationships and data. Again, the greater the expressive power of our languages, the easier our job becomes. We want to express the way information should flow through our system. We want to constrain what relationships and flows are possible. We want to specify the dependencies (and independencies) between parts of our systems, we’d like to build our systems out of well-defined components that behave according to fixed and automatically enforced rules.

Just as there are a infinity of things we’d like to say, there are a multitude of languages to speak in. And just as we want to choose the right tool for the job we also want the language that gives us the right level of expressiveness. That’s not to say that there is a linear scale of expressiveness. At one level all Turing-complete languages let you say the same things — if you’re willing to bend over backwards to varying extents. Some languages allow us to say more things and with less efforts than others. But some languages just let us say different things. And even though you could twist and turn almost any language to say almost any thing, I’ve come to feel that all languages have a character — a set of assumptions, principles and decisions at the core that affects how the language and systems built with it work.

Languages don’t stand on their own. And despite my love of languages for themselves, they’re only as important as the systems we built with them. Human languages are important and beautiful in and of themselves, but they’re far more important because of the stories, histories and wisdom they embody and allow to be expressed and recorded. Our computer languages are valuable because they let us express important thoughts, let us solve important problems and built powerful systems to make our lives better. Again, we have done great things with the most primitive of tools, but if you value your time and energy it’s a good idea to pick the right tool for the job.

Personally, I’ve always been something of a language dabbler, probably a result of being in college and needing to move to a different language for another course every few months. In my time, the languages I’ve done significant amounts of work are C, C++, Java, Python, Ruby, various assemblers and Verilog (not a programming language per se, but a medium for expression nonetheless). I wouldn’t consider myself an expert in any of them but I can definitely hold my own in C and Python, maybe Ruby too if I had to. Then there are the languages I want to learn — Lisp in general (Scheme in particular) and more recently Haskell and Scala (a newly discovered appreciation of functional programming and strong type systems). They’re all different mediums of expression each letting you say different things and there’s certainly a lot I want to say.

As a PhD student in programming languages, my job is not to become an expert in different languages (though I guess that could lead to an awesome consulting gig). My job is eventually to make some contribution to the field and push the boundaries (and that involves convincing the people standing on the boundary that it actually has been pushed). Luckily for me there are definitely lots of ways in which the state of the art for programming languages can be pushed. However, to do that I first need to know where the state of the art currently lies. Over the next few months (years?) I want to get deeper into studying languages and the ideas behind them. To start off, I want to explore functional programming, type systems and macros. And I’m sure those roads will lead to more roads to explore. Yes, you are all invited along for the ride.

Sunday Selection 2011-04-03

Happy April everyone! I hope you all had a fun April Fools and that you took any jokes at your expense in good spirit. Laughter is the best medicine and all that. Without further ado, here’s this weeks Selection.

Around the Internet

Why I Chose Typekit Businesses, business models and the psychology and ethics behind it all continue to interest me. This is one designer’s description of why he chooses Typekit over the other web-based type delivery services. There aren’t any long charts or big numbers, it’s more personal and honest.

The Holy Trinity In the process of making plans for actually going to graduate school, I’ve been spending some thinking about what I want to research and what motivations and goals are. Apart from the technical things I’m interested in, I’m starting to believe that what we need more than ever is a “philosophy of computation” — ideas and concepts that define computation and our relationship to it at a higher level. Robert Harper’s recent blog post is a milestone on that journey.

This Hack was Not Planned Another gem from the man, the legend, the hacker _why the luck stiff. Not matter how much we talk about agile processes and software development methodologies, sometimes we just need to sit down and churn out a neat hack. This one is for the knife-edge hacker in all of us.

From the Bookshelf

Rework When I read and reviewed this book almost exactly a year I was perhaps less than charitable. I stick by my point that it is largely the best material pulled from their blog, but after a year I’m seeing it from the eyes of someone who hasn’t recently been drinking the 37signals kool-aid non-stop. There are powerful and useful ideas distilled into a very potent form. If you’re looking to start a business (or even just a new project) but are unsure how set yourself apart from the Jones’ this book should give you some really good ideas.

Software

Pinboard.in My reading has gone up a lot in the last few months and I’ve been making a conscious effort to track everything I read. Since most of my reading is online, I’ve been using an excellent bookmarking service called Pinboard. It’s not free and it’s not overflowing with social features, but it stores and organizes your bookmarks and does it well. If you’re someone who reads a lot online and you want to keep track of what you’re reading, the $9.29 signup fee is a small fee to pay. (The price goes up based on the number of people who sign up, so hurry. It was a bit over $6 when I joined.)

Computing is still in the dark ages

Despite all the talk of Web 2.0 and the shiny multicore machines with their gigabytes of RAM and billions of cycles per second, I sometimes can’t help feel that we are still very much in the dark ages of computing. This time around my dark gloomy feelings have been brought about by this message to a mailing list which in turn was sparked off by the announcement of the Go Programming Language. As a computer user and a programmer I feel that the actual use of computers is far below their potential.

As the years go by, it seems like we keep on piling layer on top of layer while the results aren’t proportional to what we have to learn to get things done. Now, I’m not proposing that we all start writing down-to-the-metal code or force everyone to become a programmer, but things are starting to look like a mess. Web programming is an interesting development, but it adds yet another layer on top of the existing kernel, operating system, libraries and GUI toolkits. Add to that the fact all browsers are still a bit different from each other and you can start to understand why I’ve yet to make a serious foray into web programming.

But even without the web and the many formats and barely interoperating systems out there, there’s enough on the desktop to get you depressed. Start with the fact that there are currently three major operating systems out there and if you want to write a program that runs on all three of them, you don’t have an easy task. You either embrace three different toolkits and programming methodologies and maintain 3 very different codebases, or you use something like Java which works on all three, but screams non-native on each one. Even though there are languages like Python that run on all of three, it really puts me off that there is still no top-notch multiplatform GUI library. wxWidgets tries pretty hard, but if you look at the screenshots you can pretty easily that they don’t look quite right. It’s not very surprising that lots of smart developers are flocking to the web, where things in comparison are a lot smoother.

There is also the fact that programming languages, like all other pieces of major software, suck more than others. I still stand by what I said in my last post, that it’s an exciting time for language enthusiasts, but I also feel that there are some lessons we really need to learn. I’m starting to have concerns that there may not be any true general purpose language, simply because there are so many different types of problems to be solved. I think we need to start creating broader categories: a set of systems languages similar to C going in the direction of D and Go. A set of hyper-optimizing VM-based languages designed for long-running, parallel server applications (the current JVM is a good example). A set of languages for writing end-user apps that are significantly high-level, but are still compiled to pretty fast native code (maybe not C or even optimized VM fast, but better than todays Python or Ruby). I’m thinking Python in its Unladen Swallow incarnation might fill this gap.

As a programmer, the state of tools that we have to use is really quite depressing. Tools like Emacs and Vi are powerful and all, but let’s face it: we could really be having much more powerful IDE technology. We should be having full blown incremental compilation with autocompletion and support for rendering documentation for every major language out there. We should also have seamless version control with granularity down to the undo level. Every change I make should be saved and I should be able to visually browse all these changes, see what they are and restore to an older state (or commit them if I want to). We have the raw computing power needed to do all this, but yet we remain stuck doing mostly batch-style edit-compile-debug cycles and mucking around in plain text. Eclipse with its incremental compiler makes things much easier, but there’s so much more we could be using our machines for.

As a user, what irritates me is the amount of manual labor we still have to do on a daily basis. We still have to carefully name and place files so that we can file them later. I have to manually hit the save button (see version control bit above). Even with the Internet collaboration is a mess with most people throwing around emails with increasingly larger attachments. Add to that the fact that most email clients are pretty dumb pieces of software. Google Wave is a step in the right direction, if enough people get around to actually using it (and if it can integrate to some extent at least with the desktop). Also I think the web and the desktop need to be brought closer together. Ideally I would be able to sit down on any computer with a live Internet connection and have my full custom work environment (or at least the most important parts of it).

I’m fully aware that none of the things I’ve mentioned are trivial. In fact, they’re probably very hard projects that will take expert teams a good few years to complete. One day I would like to seriously work on some of the programmer-related issues, especially the IDE part. I love Emacs, but there are some parts of Eclipse I really like too. For the time being I’m going to have to make do with what I have, but I’ll be sure to keep an eye for interesting things and movements in the right direction.

It’s a great time to be a language buff

I make no secret of the fact that I have a very strong interest in programming languages. So I was naturally very interested when news of the Go Programming Language hit the intertubes. Go is an interesting language. It pulls together some very powerful features with a familiar, but clean syntax and has lightning fast compile times. It certainly takes a place on my to-learn list along with Haskell and Scala. But even as Go becomes the latest hot piece of language news, it dawned on me that over the past few years we’ve seen a slew of interesting languages offering compelling alternatives to the industry “mainstream”.

I guess it all started with the rise of scripting languages like Python, PHP, Ruby and the poster boy of scripting: Perl. Personally, these languages with their dynamic typing, “batteries included” design and interesting syntax provided a breath of fresh air from the likes of C++ and Java. Not that C++ and Java are necessarily bad languages, but they aren’t the most interesting of modern languages. In the early years of this decade computers were just getting fast enough to write large scale software in scripting languages. Things have changed a lot since then.

Dynamic languages aren’t just reserved for small scripts. Software like Ruby on Rails has proved that you can write really robust back end infrastructure with them. The languages for their part have kept on growing, adding features and making changes that keep them interesting and downright fun to use. Python 3.0 was a brave decision to make a break from backwards compatibility in order to do interesting things and it goes to show that these languages are far from ossifying or degrading.

Then there is JavaScript which was supposed to die a slow death by attrition as web programmers moved to Flash or Silverlight. But we all know that didn’t happen. JavaScript has stayed in the background since the rise of Netscape, but it’s only recently with advances in browser technology and growing standards support that it has really come into its own. I’ve only played with it a little, but it’s a fun little language which makes me feel a lot of the same emotions I felt when discovering Python for the first time. Thanks to efforts like Rhino, you can even use JavaScript on the client side for non-web related programming.

Of course, if you want to do really interesting things with these languages, then performance is not optional. Within the last year or two there’s been a strong push in both academia and industry to find ways to make these languages faster and safer. Google in particular seems to be in the thick of it. Chrome’s V8 JavaScript engine is probably the fastest client side JavaScript environment and their still experimental Unladen Swallow project has already made headway in improving Python performance. V8 has already enabled some amazing projects and I’m waiting to see what Unladen Swallow will do.

While we’re on the topic of performance, mentioning the Java Virtual Machine is  a must. The language itself seems to have fallen from grace lately, but the JVM is home to some of the most powerful compiler technology on the planet. It’s no wonder then that the JVM has become the target for a bunch of interesting languages. There are the ports of popular languages — JRuby, Jython and Rhino. But the more interesting ones are the JVM-centric ones. Scala is really interesting in that it was born of an academic research project but is becoming the strongest contender to Java’s position of premier JVM language. Clojure is another language that I don’t think many people saw coming. It brings the power of LISP to a modern JVM unleashing a wide range of possibilities. It has it’s detractors, but it’s certainly done a fair bit to make Lisp a well known name again.

Academia has always been a hot bed when it comes to language design. It’s produced wonders like Lisp and Prolog and is making waves again with creations like Haskell (whose goal is ostensibly to avoid popularity at all costs) and the ML group of languages. These powerful functional languages with wonderful type inference are a language aficionado’s dream come true in many ways and they still have years of innovation ahead of them.

Almost as a corollary to the theoretically grounded functional languages, systems languages have been getting some love too. D and now Go are both languages that acknowledge that C and C++ have both had their heyday and it’s time to realize that systems programming does not have to be synonymous with bit twiddling. D has gotten some flak recently for not evolving very cleanly over the last few years, but something is better than nothing. Also a real shift towards eliminating manual memory management is a welcome addition.

As someone who intends to seriously study language design and the related concepts in the years to come, it’s a really great time to be in getting involved in learning about languages. At the moment I’m trying to teach myself Common Lisp and I have a Scala book sitting on the shelf too. One fo these days, I plan on sitting down and making a little toy language to get used to the idea of creating a language. Till then, it’s going to be really interesting just watching how things work out in an increasingly multilingual world.

How many programming languages should I learn?

OSNews has started a series called A-Z of programming languages where they’ve been posting interviews with the creators of well-known programming languages. Till now they’ve done AWK, Ada and BASH. Of those three the only one I’ve had any experience with is BASH, and not too much of that. But considering that there are literally hundreds of programming languages (and many more dialects or implementations) which ones should one learn to be a good programmer?

I know that many programmers out there simply learn just one or two languages and then use them throughout their careers (or at least until it becomes impossible to find a job). Certainly that works, to some point at least and so the question is, do we really need to learn languages that are not the “industry standard” (i.e. whatever has the most jobs on offer). If all you’re interested in is a job, then no, you don’t. One or two languages will probably be enough. However, if you want to keep learning and keep developing as a programmer, then the answer is most certainly yes. Some programming languages are quite similar in terms of syntax and power, but some are very different and teach you think in different ways. It’s these different languages that are going to make you better as a programmer.

So we come back to our initial question: How many languages to learn and perhaps more importantly which ones? I think that there are two types of languages that are worth learning: Those that make you think differently and those that have been used to write a large amount of high quality code. Languages like Smalltalk and LISP and to some extent Java are strongly paradigm-oriented, they emphasize a specific style of programming, in the case of Smalltalk it’s object-oriented and in the case of LISP and it’s derivatives, it’s pure functional. Such languages will teach you important lessons which you can apply even when you’re using some other languages.

One of the languages that has been widely embraced by the hacker community is C. There is a incredible amount of really good code written in C, the most famous of which is probably the Linux kernel. Unless you’re a systems programmer, you probably won’t have to use C or C++ much, but you can benefit a lot from reading well written code. C and C++ can be used to write powerful code, but sometimes the power doesn’t quite justify taking the trouble of all the low-level work that you have to do. In that case, it’s good to have a general purpose high-level language lying around. I would strongly recommend Python, but you might find another language easier for everyday use. But once you have made a choice, learn it well and use it to it’s maximum.

It might be worthwhile learning a language that has powerful text processing abilities, like AWK or Perl. It might make your work easier if you know one or both. And there is an awful lot of code written in Perl for the purpose of gluing larger programs together, so it might be worthwhile to learn it. However, Perl has been falling from grace for a good few years and many people are now using Python and Ruby to the same things they used Perl for. I don’t have a concrete opinion of Perl at the moment, but I think it’s something you can put of learning until you have an actual need for it.

You should also learn Java. I personally consider Java to be a decent language, but not a good one and I probably wouldn’t use it if I had a choice. However, it is a very popular one and if you get a job as a programmer, chances are you’ll encounter a substantial amount of Java code which you have to deal with. And you won’t be much of a programmer if you can’t deal with other people’s code. So learn Java.

Knowing basic HTML or CSS is also a good idea. You might not be a full fledged web artist, but you should be able to throw together a decent web page without much trouble. Considering the growing importance of the web, learning a web programming language is becoming important. It’s not quite a necessity yet, but I think in less than 5 years it will be. I can’t recommend one now, because I have no experience, but I think that Ruby might be a good idea, because it’s a decent general purpose language as well.

I should say, that I don’t know all of the above, but I have had some experience with each of them. I think each of them have contributed to making me a better programmer, and that the more I delve into them and use them for harder problems, I will continue to improve. In conclusion, I would like to say that if you are committed enough to learn multiple languages well, you should also invest some time in learning a powerful text editor such as Vi or Emacs. Though you can certainly write great code with nothing more than Notepad, using a more powerful tool can make your job quite a bit easier (and considerably faster). Once you turn fine-tuning these editors to suit your style and habits, you won’t want to use anything else. If you’re seriously out to become the best programmer you can be, you’re going to want the best possible tools at your disposal.

I’ll be happy to here your comments on what programming languages you might recommend to anyone looking to improve their programming.