Sunday Selection 2013-10-13

Around the Web

Advice to a Young Programmer

I’ve learned a lot about system design and programming since I started grad school two years ago. I’m still learning a lot as I use new tools and techniques. This post does a good job of summarizing an experienced programmer’s advice to someone younger and newer to the craft.

Why Microsoft Word Must Die

I’m quite happy to say that I haven’t used Word in years. I don’t have a copy installed and I can’t remember the last time I needed to edit a Word document. I use LaTeX for most of my writing (everything from applications and academic papers to my resume). For the rare occasion that I need to open a Word document, Google Docs is more than adequate. Charlie Stross is one of my favorite newer science fiction authors and like most of his technology-related writing, this piece is on point about why the modern Microsoft Word is simply bad.

Less is Exponentially More

This article about why Go hasn’t attracted more C++ programmers is over a year old, but as a student of language design it’s interesting to see how language features interact with programmers’ needs. If you’re interested in programming languages or write lot of C++ code this is a worthwhile read.

Video

Jiro Dreams of Sushi

I’ve been meaning to watch this documentary for a long time, but finally got around to seeing it last night. It’s about Jiro Ono, and 85-year-old sushi master and owner of a tiny 3-star Michelin sushi restaurant in Japan. At its heart it’s a story of a man’s quest for perfection and devotion to his craft. Though it’s ostensibly about the art of sushi, I think there’s a lot for any professional can learn. It reflects a way of life and devotion to purpose that we rarely see in day-to-day life. You can catch it on Netflix streaming and on Amazon Instant Video (it’s not free for Prime members though).

Advertisements

We’ll always have native code

A few days ago my friend Greg Earle pointed me to an article with the provocative title of “Hail the return of native code and the resurgence of C++“. While the article provides a decent survey of the state of the popular programming language landscape, it misses the point when it comes to why VM-based or interpreted languages have become popular (and will continue to be so).

The reason why languages running on managed runtimes are and should continue to be popular is not quite as simple as relief from manual memory management or “syntactic comforts” as the author puts it. While both are fair reasons in and of themselves (and personally I think syntax is more important than is generally considered) the benefits of managed runtimes are more far reaching.

As Steve Yegge put it, “virtual machines are obvious“. For starters most virtual machines these days come with just-in-time compilers that generate native machine code instead of running everything through an interpreter. When you have complete control over your language’s environment there are a lot of interesting and powerful things you can do, especially in regards to the JIT. No matter how much you optimize your statically generated machine code you have far more information about how your program is actually running at runtime. All that information allows you to make much smarter decisions about what kind of code to generate.

For example, trace-based virtual machines like Mozilla’s Spidermonkey look for “hotspots” in your code: sections that get run over and over again. It can then generate custom code tuned to be fast for those cases and run that instead of the original correct but naive code. Since JavaScript is a dynamically typed language and allows for very loose, dynamic object creation, naive code can be very inefficient. Thus the runtime profiling is all the more important and can lead to massive speed improvements.

If you’re running a purely native-compiled language you miss all the improvements that you could get if you reacted to what you saw at runtime. If you’re writing an application that is meant to run for long periods of time (say a webserver for a large, high-traffic service) it makes all the more sense to use a VM-based language. There is a lot of information that the VM can use to do optimizations and a lot of time for those optimizations to kick in and have a definite performance benefit. In robust virtual machines with years of development behind them you can get performance that is quite comparable to native code. That is, unless you’re pulling all sorts of dirty tricks to carefully profile and optimize your handwritten native code in which case all bets are off. But honestly, would rather spend your days hand-optimizing native code or writing code that actually does interesting things?

Personally, I hardly think a new C++ standard is going to usher in a widespread revival of native code. C++ has problems of it’s own, not the least of which is that it simply a very big language that very few people fully understand (and adding features and constructs doesn’t help that). A fallout of being a complex language without a runtime is that writing debugging or analysis tools for it is hard. In Coders at Work both Jamie Zawinski and Brendan Eich denounce C++ to some extent, but Eich adds that his particular grievance is the state of debugging tools that have been practically stagnant for the last 20 odd years. Being fast, “native” or having a large feature list is not nearly sufficient to be a good programming language.

All that being said, there is definitely a place for native code. But given the expressive power of modern languages and the performance of their runtimes I expect that niche to keep dwindling. Even Google’s new Go programming language, which is ostensibly for systems programming and generates native code has some distinctly high level features (automated memory management for one). But if you’re building embedded systems or writiing the lowest level of operating systems kernels you probably still want to reach for a fast, dangerous language like C. Even then I wouldn’t be surprised if we see a move towards low-level unmanaged kernels fused with managed runtimes for most code in some of these places. Microsoft’s Singularity project comes to mind.

We’ll always have native code (unless we restart making hardware Lisp machines, or Haskell machiness) but that doesn’t mean that we’re going to (or should) see a widespread revival in low-level programming. I want my languages to keep getting closer to my brain and if that means getting away from the machine, then so be it. I want powerful machinery seamlessly analyzing and transforming source code and generating fast native code. I want to build towers on top of towers. You could argue that there was a time when our industry needed a language like C++. But I would argue that time has passed. And now I shall get back to my Haskell and JavaScript.

Compilers and nuclear reactors

This summer I’m doing an internship at a small, research-y software shop called GrammaTech. They do a lot of research-based projects and have a static analysis tool called CodeSonar that is really quite nifty. I’ve been poking around the CodeSonar codebase as part of my work and I’m pretty impressed by the kind of things they do. Most of it is in C and I can easily say I’ve never seen C written this way before. They leverage C’s close-to-the-metal nature and the preprocessor to create some pretty powerful abstractions. It makes me painfully aware of how much I still have to learn. While marveling at what a deep knowledge of C can do I came across “Moron why C is not Assembly” by James Iry – an explanation of why C is not just syntactic sugar for assembly language.

As I learn more about programming in general and programming languages in particular, I’m starting to think that there is something of a Law of Conservation of Power, vaguely akin to the conservation of matter and energy (but not quite as absolute). For example, Iry talks about how C enforces the stack abstraction and hides any parallelization in the hardware (or in the generated code). By moving from the assembly world to the C world you’re trading one form of power for another – you obey the constraints of the stack and function calls get much simpler. But you lose the ability to fine tune using dedicated hardware instructions.

This is true as you continue exploring more languages – give up the looseness of C’s weak type system for something stricter and stronger (ML with it’s algebraic datatypes for example) and you can have the machine enforce invariants and guarantee certain properties of your code. You can perform easier and more elegant matches and actions on the type of your data. But you give up the flexibility that comes of raw data and fundamentally unstructured bits. If you choose strict immutability and pure functions (like Haskell or Clojure) you get to interact with your datatypes in a more mathematically precise form, you get to reap the benefits of concurrency without worrying about data corruption (to some extent). But you lose the ability to quickly stash some metadata into the corner of some variable data structure and pull it out at will.

If we start viewing different languages as tradeoffs in the power available to the programmer then a compiler becomes a very special tool – a power transformer, akin to a nuclear reactor (or a Hawking’s Knot if you don’t mind some futurism). A compiler, at its core, takes a stream of symbols (according to predefined syntactical rules) and transforms them into another stream of symbols (according to predefined semantic rules). Along the way, the compiler is responsible for enforcing the power transforms – ensuring that your C code doesn’t get to the underlying stack, that your Haskell code obeys the type constraints. If our languages are tools with which we build our universes, our compilers enforce the laws of physics (whatever laws they may be). The reason we’re computer scientists and not physicists is that we can create and change these laws at whim, instead of only studying and exploring them.

Just as we don’t want to be putting nuclear reactors in SUV’s there isn’t a “best” power tradeoff. Do you need to be really close to the metal on a resource constrained embedded system? Use C. Do you need a guarantee that a successful compilation will rule out a certain class of runtime errors? Use ML. If you need a fuel-efficient vehicle to take you to work everyday, get a Prius. If you need a self sufficient, water-borne weapons platform that only needs to be refueled every few decades and can rain down vengeance on your enemies should the time come, then invest in an aircraft carrier or a nuclear submarine. Don’t bring a knife to a gunfight.

There are two big elephants in the room: Turing-completeness and Lisp. All Turing complete languages are strictly equivalent in their computational power, but that misses the point of this discussion. Most programmers are not writing code for machines at all: they are writing programs for programmers (including themselves) to read, understand and use (and only incidentally for machines to execute; thank you, SICP). When you change the rules of the game to be not strict computational power but expressiveness and understandability to another human, this Conservation of Power thing becomes much more important. Choosing the correct set of tradeoffs and balances (and hence the correct language and related toolsets) becomes one that has far reaching impact on your team and project. Make the wrong choice and the Turing tarpit will swallow you alive and surrender your fossilized remains to a future advanced insectoid race.

Lisp is another matter entirely. The so-called “programmable programming language” has been used to build everything from operating systems to type systems which are Turing complete in themselves. As Manuel Simoni puts it, in the Lisp world there is no such thing as too much power. Lisp laughs flippantly at the Law Conservation of Power by placing in the programmer’s hands the enforcer of the Law – the compiler itself. By virtue of S-expressions and macros Lisp allows and encourages you to play God. The natural way to program in Lisp is to “write code that writes code” – creating your own minilanguages tailored to the task at hand. With great power comes great responsibility of course so the Lisp programmer must be particularly careful.

I haven’t explored Lisp as much as I’d like to and I’m only just starting to look into ML and Haskell. But as a career programmer (or some approximation thereof) I think it’s a good idea to routinely move between various points on the power spectrum. That’s not to imply that it’s a perfectly linear scale, but that’s a matter for another post. As my experience at GrammaTech is showing there are delights, wonders and challenges no matter where you decide to plant your feet.

Sunday Selection 2011-02-20

Reading

Is Scheme faster than C? The cheapest way to make your code faster is to throw more hardware at it. But for a cash-stripped college student reworking the algorithm is probably a better idea. Here’s a suspense-filled story of how  superior algorithm devised in Scheme and ported to C turned out to be faster than a naive C implementation.

On Writing Books for Programmers I think writing is an important skill, especially for programmers. Putting your thoughts in writing helps with the thinking process. But this piece looks at writing from another perspective — namely writing for (as well as by) programmers. It’s worth reading if you’re writing for programmers, even if it’s not a book.

Media

Parallelism and Concurrency in Programming Languages Rob Pike is certainly a person worth listening to when it comes to programming languages. And of course concurrency and parallelism is all the rage nowadays. Put the two together and you have a lot to learn from this talk.

Software

Firefox 4 beta Google Chrome might be giving Firefox some stiff competition, but the folks at Mozilla are definitely holding their own. Firefox 4 is getting an impressive set of improvements and features. I think their user interface model is better than Chrome’s in some ways (especially with Panorama). There are still rough edges and most extensions will probably not work, but it’s stable enough for people to check out and use on a daily basis.

Switch-case statement in Python revisited

This post is part of the Powerful Python series where I talk about features of the Python language that make the programmer’s job easier. The Powerful Python page contains links to more articles as well as a list of future articles.

About nine months ago I wrote a post talking about how you might go about implementing a switch case statement in the Python programming language. Python lacks a native switch-case like construct, but there are multiple ways to fake a similar effect. The most naive way would be multiple if-else blocks. A more Pythonic way would be to use Python’s dictionaries and and first class functions. A simple example in C, and the two Python ways is shown below.

switch(n) {
  case 0:
    printf("You typed zero.\n");
    break;
  case 1:
  case 9:
    printf("n is a perfect square\n");
    break;
  case 2:
    printf("n is an even number\n");
  case 3:
  case 5:
  case 7:
    printf("n is a prime number\n");
    break;
  case 4:
    printf("n is a perfect square\n");
  case 6:
  case 8:
    printf("n is an even number\n");
    break;
  default:
    printf("Only single-digit numbers are allowed\n");
  break;
}

if n == 0:
    print "You typed zero.\n"
elif n== 1 or n == 9 or n == 4:
    print "n is a perfect square\n"
elif n == 2:
    print "n is an even number\n"
elif  n== 3 or n == 5 or n == 7:
    print "n is a prime number\n"

options = {0 : zero,
                1 : sqr,
                4 : sqr,
                9 : sqr,
                2 : even,
                3 : prime,
                5 : prime,
                7 : prime,
}

def zero():
    print "You typed zero.\n"

def sqr():
    print "n is a perfect square\n"

def even():
    print "n is an even number\n"

def prime():
    print "n is a prime number\n"

The Fine Print

However, as the comments in the original post show, neither of the Python examples are a very good solution. They lack the versatility and power of the original C form, both in terms of syntax and semantics. Syntactically, neither of the forms do a good job of conveying the intent of the written code. The if-else form does an acceptable job and may be fewer lines, but it clutters the code with keywords and multiple equality checks and boolean combinations. The dictionary form is even worse at conveying the intention of the code.

Semantically, the two forms don’t stand up to the C form either. Multiple if-elses are probably as close as you can get, but it doesn’t work quite the same way. In particular, the ‘fall through’ semantics of a switch-case statement, where consecutive blocks are executed if there is no break statement, is a bit clumsy to duplicate. Repetitive code is almost always bad and trying to mimic the semantics of switch-cases for non-simple examples (including the above one) inevitably requires some repetition. Using dictionaries is simply a semantic mess. Creating a list and coming up with function names is really too much trouble for the simple task at hand. List comprehensions and dictionary comprehensions are powerful tools, but they simply have no place in something like a switch-case statement.

The Real Problem

This is one of the cases (pun unintended) where though you can use existing language features to get what you want (or something close), you would really like to have in-built language support. Python is a pretty well designed language as far as languages go, but it has it’s share of quirks. Personally I don’t consider a lack of switch case a particularly damaging lack, though there have been times where I wished there was one. In fact, a little Googling shows that there was a Python Enhancement Proposal submitted a few years ago but it was rejected due to lack of popular support.

There is an excellent Stack Overflow question on what the switch/case substitutes are and how to choose between them. The first answer to that question captures the essence of the problem at hand. It’s not a matter of how the alternative should be implemented, but rather what the alternative should mean. The if-else and dictionary lookups are generally useful if you have a simple choice to make in code which is mainly procedural. However if your code base is very object oriented, then you are probably better off with something like polymorphism to choose between alternatives. The solution should fit the problem, not the other way around.

The final thing I would like to say on this matter doesn’t involve Python at all. Rather it’s about Common Lisp, which I’ve been teaching myself for the last few weeks. The Lisp family of languages is particular famous (or infamous, depending on your point of view) because of the general lack of concrete syntax. Lisp code is written directly in the form of S-expressions. Essentially you write out the abstract syntax tree that in other languages is generated by the parsing the source code. Because the programmer has access to this representation, it’s possible to write code that will generate these S-expressions at compile time. In essence, this allows Lisp to be a programmable programming language. This is important because you as the programmer can basically add your own extensions to the language without waiting for a committee or community to approve it. In our case, if Common Lisp didn’t come with a switch-case statement and you really needed one, you could roll your own. In fact, it has been done. That’s not to say that rolling your language features is easy or something you should do on a daily basis, but in languages that allow it, it can be a very powerful tool if used right.

To be fair, I think you could write code to generate code in any run it in any language that has text processing and dynamic loading, but it would probably be very tedious and error-prone. The Lisp S-expression form lets you do it a much more elegant and powerful fashion.

In Conclusion

While Python does not have a switch case statement (and will probably never have one) there are a lot of other language features you can use to get the job done. It’s important to remember that you shouldn’t just be trying to recreate the semantics of switch-case (as that can be very messy). As the original post shows, trying to clone the C implementation is a futile endeavor. You need to pay attention to what the problem really is and then pick a Python feature that solves the problem correctly and elegantly. And if you get the chance, do explore languages like Lisp where syntax is fluid. It will help you better understand the difference between what your code looks like it’s doing and what it actually is doing. Happy hacking.