Sunday Selection 2011-02-20

Reading

Is Scheme faster than C? The cheapest way to make your code faster is to throw more hardware at it. But for a cash-stripped college student reworking the algorithm is probably a better idea. Here’s a suspense-filled story of how  superior algorithm devised in Scheme and ported to C turned out to be faster than a naive C implementation.

On Writing Books for Programmers I think writing is an important skill, especially for programmers. Putting your thoughts in writing helps with the thinking process. But this piece looks at writing from another perspective — namely writing for (as well as by) programmers. It’s worth reading if you’re writing for programmers, even if it’s not a book.

Media

Parallelism and Concurrency in Programming Languages Rob Pike is certainly a person worth listening to when it comes to programming languages. And of course concurrency and parallelism is all the rage nowadays. Put the two together and you have a lot to learn from this talk.

Software

Firefox 4 beta Google Chrome might be giving Firefox some stiff competition, but the folks at Mozilla are definitely holding their own. Firefox 4 is getting an impressive set of improvements and features. I think their user interface model is better than Chrome’s in some ways (especially with Panorama). There are still rough edges and most extensions will probably not work, but it’s stable enough for people to check out and use on a daily basis.

Advertisements

Language paralysis

It’s winter break which means that I have a good amount of free time on my hands. Though I’m all in favor of sitting around and doing nothing, I do get bored after a few days of that and tend to look for something to keep my mind occupied. I decided that this time I would sit down and learn a new programming language, something I’ve been wanting to do for a while. But the thing is I can’t make up my mind as to which one.

I’ve considered learning one of three languages, each of which is a powerful yet somewhat quirky and niche language. My choices are Common Lisp, Scheme or Haskell. Common Lisp and Scheme are both Lisp dialects, but with different purposes and hence a different feel. From what I’ve learned Common Lisp is a full-fledged industrial strength, general purpose programming language while Scheme has a thriving research community surrounding it and is a great test-bed for implementing programming-language related ideas. Both share Lisp’s defining characteristics such as powerful dynamism and macro facilities. Both are inherently functional languages but are also capable of playing host to other programming paradigms (Common Lisp in particular with its Common Lisp Object System).

Haskell on the other hand is quickly becoming one of the most powerful programming languages on the planet and may be coming close to threatening Lisp’s throne. It’s a purely functional programming language with an increasingly powerful and capable type-system. It’s an excellent tool for language and type-system related research thanks to great parsing facilities and it seems to me that Haskell is at the forefront of computer science research today. Haskell doesn’t have a macro system, but I’ve never heard that be an issue.

All my choices are powerful languages with strong communities, but I simply can’t make up my mind as to which one to learn. I admire all of them and can see the strengths of each, but neither one is really compelling enough for me to sit down and decide to learn it. It’s time to explore some of the reasons behind my current paralysis and see if I can figure out a solution.

Looking back on my history of learning programming languages all the ones I’ve learned to any depth have been motivated by external cause. I learned Java because it was used in my basic CS courses. I learned C++ because I used it in my software engineering class. I learned C for operating systems and digital circuits courses. I learned Python because we use it for most of our research code at school (and it’s become the language I’m most familiar with). I’ve also picked up some JavaScript because I wanted to use it to give some dynamism to my website.

Unfortunately I don’t have similar motivations to help me make my current decisions. My current research is being done in Ruby because of it’s flexible object system. I have some ideas for a side project to pursue next semester but it’s not likely to be something requiring Lisp or Haskell’s particular talents. I’m not going to be doing any research into languages or type systems until the summer at least (and maybe not until later this year). As of this moment, I have zero external motivation to pick and learn any of these languages.

The thing is I really do want to learn one (and eventually all) of these languages. I think it’s a good idea for programmers to be continually learning new languages and expanding the ways in which we can think of our problems. However, I’m coming to realize that simply sitting down and going through a tutorial isn’t enough, at least not for me. I need an actual problem that I intend to solve in the given language. It doesn’t have to be anything fancy, but it should be something that gives me a well-rounded view of the language and it’s capabilities (especially when the language is Lisp or Haskell).

I consider myself a language buff. But it’s one thing to say that I’m interested and read about them and another to sit down, learn them and write code in them. Right now, I’m very interested in learning about Common Lisp, Scheme and Haskell and read both blogs and papers about them. But I can’t take that interest and use it to bridge the gap to learning and using them. Motivation has always been a bit of a problem for me and I’m rather annoyed that it’s preventing me from learning what I want to.

Since I still have about two and a half weeks of vacation left I’m going to give some serious thought as to what sort of programs I want to write in the near future and how I can choose the language that is most beneficial along those lines. At this point I’m open to suggestions for Lisp/Haskell projects that would be interesting as well as hearing about how other people motivate themselves to learn languages that they aren’t actively using.

Thinking about Documentation

My friend Tycho Garen recently wrote a post  about appreciating technical documentation. As he rightly points out technical documentation is very important and also very hard to get right. For someone who writes code I find myself in the uncomfortable position of having my documentation spread out in at least 2 places.

A large part of my documentation is right in my code in form of comments and docstrings. I call this programmer-facing documentation. It is documentation that will probably only be seen by other programmers (including myself). However, even though it might only be seen by programmers who are using (or changing) the code doesn’t mean that it should just be in the code. More often than not, it’s advisable to be able to have this documentation exported to some easier-to-read format (generally hyperlinked HTML or PDF). Of course I don’t want everyone who wants to use my software to go digging through the source code to figure out how things work. A user manual is generally a good idea for your software no matter how simple or complex it might be. At the very least there should be a webpage describing how to get up and running.

One of the major issues of documentation is that it’s either non-existent or hopelessly out of date. A large part of the solution is simply effort and discipline. Writing good comments and later writing a howto are habits that you can cultivate over time. That being said, I’d like to think that we can use technology to our benefit to make our job easier (and make writing and updating documentation easier).

Personally I would love to see programming languages grow better support for in-source comments. Documentation tools like Javadoc and Epydoc certainly help in generating documentation and give you a consistent, easy-to-understand format, but the language itself has no idea about what the comments say. They are essentially completely separate from the code even though they exist side by side in the same file. I would love it if languages could work together with the documentation, say by autogenerating parts of it, or doing analyses to detect inconsistencies.

As for documentation that lives outside of the code, I’m glad to see that there is a good deal of really good work being done in this area. Github recently updated their wiki system so that each wiki is essentially a git repo of human-readable text files that are automatically rendered to HTML. Github’s support for Git commit notes and their excellent (and recently revised) pull requests systems provide really good systems for maintaining a conversation around your code. The folks over at Github understand that code doesn’t exist by itself and often requires a support structure of both documentation and discussion surrounding it to produce a good product.

So what’s my personal take on the issues? As I’ve said before I’m starting work on my own programming language and I intend to make documentation an equal partner to the code. I plan on making use of Github Pages to host the documentation in readable from right next to my source code. At the same time, I’m going to giving some thought into making documentation a first class construct in the language. That means that the documentation you write is actually part of the code instead of being an inert block of text that needs to be processed externally. The Scribble documentation system built on top of Scheme has some really interesting ideas that I would love to look into and perhaps adapt. Documentation has always been recognized as an important companion to coding. I’m hoping that we’re getting to the stage where we actually pay attention to that nugget of common wisdom.

Hope, scarred and bleeding

“The Heavens burned, the Stars cried out. And under the ashes of infinity, Hope, scarred and bleeding, Breathed it’s last.”

Ulatempa Poetess
Elegy for the Commonwealth
Andromeda

The International Lisp Conference is on right now and one of the interesting things that happened is that MIT’s Gerald Sussman talked about why MIT’s famous 6.001 course uses Python instead of Scheme. I hadn’t known that MIT had switched to Python from Scheme, and I must say that I’m very sad (as the dramatic opening quote makes obvious).

Truth be told, MIT has always been something akin to Holy Ground to me. It is deeply involved with computer science history and lots of great discoveries have come from there and from the people who worked there. 6.001 has to some extent become famous as MIT’s hallmark course. It’s the course that spawned the wonderful Structure and Implementation of Computer Programs, a book that I think all serious programmers should read at some point in their careers, preferably sooner rather than later. Since the 80’s 6.001 has used the Scheme programming language, a lightweight version of Lisp which is itself one of computer science’s shining accomplishments. Now it uses Python.

So what’s the big fuss about? Well there are a number of different reasons. 6.001 has always been meant to be a hard course, designed to fundamentally change the way students think. It’s not a simple programming course. If you read the introduction to SICP (and then the rest of it), you’ll see that 6.001 is more of a course in engineering philosophy and techniques which only incidentally uses computers and software. I’ve been reading SICP in bits and pieces for about a year and half and the things I’ve learned have made me a much better programmer, engineer and thinker. Is it possible to have the same sort of teaching and learning experience with Python instead of Scheme? Perhaps. I’m not sure. But that’s the tip of the iceberg.

Reading the reasons Sussman gave for replacing Scheme with Python is very distressing. Basically it comes down to the admission of the fact that the software industry is for the most part, one giant mess. Software isn’t well documented and the interfaces are inconsistent. As a result  you spend a lot of time tinkering with 3rd party libraries to figure how they work as opposed to how they are supposed to work. All this flies in the face of what 6.001 was supposed to teach: great engineering works are built out of simpler, smaller components that are well understood and encapsulated. I’ll be the first to admit that what Sussman says is completely true, I’ve experienced it myself more than once. This hasn’t been helped by the proliferation of CS courses that teach that students to be dependent on frameworks and third libraries without teaching them about how those frameworks work or what to do when they break. It’s part of the game and something you have to learn to deal with.

But it seems to me that Sussman’s statement is like throwing in the towel. If you can’t beat them, join them, that sort of thing. I’m sure there’s more to the argument than just that, but it would be hard to avoid the fact that 6.001 has been substantially toned down. There is no way students are going to learn what good software is unless they have some experiences of making it, themselves. There’s a reason why math students still work on problems even though Mathematica is a great tool.  Python is a great beginner’s language, but 6.001 wasn’t exactly made for beginners. Referring back to the preface to the first edition of SICP, the author says that many students have had experience of computers. Let’s take some time to remember that when this was written decades ago, ‘experience of computers’ didn’t mean playing games, using Office or browsing the web. It probably implied a general understanding of how computers work and some amount of programming. Times change and one could say that 6.001 is just adapting to the times. That could certainly be true and if it is, it’s a sign that the change isn’t good.

If students are starting programming at a later age, and hence aren’t accustomed to the very demanding logical thought required by 6.001, then it’s a sign that the software industry is in grave danger. Let’s face it: 4 years are not enough to learn how to be a good programmer or software engineer. 4 years working full time at it wouldn’t be enough. Now throw in the typical trappings of college life: other courses, sports, parties, relationships. The time most CS students spend actually practicing their skills go down dramatically. And it’s not like computer science is easy, we all know that. After graduation a job means that you have even little time and inclination to improve yourself. CS education needs to start early. The best programmers I know (both personally and otherwise) are the ones who started young, in their mid-teens mostly. The industry depends on mediocre programmers who are then most likely doomed to stay mediocre because they missed the time of their lives where they could have learned the most.

You see, the problem and feelings of hopelessness go beyond 6.001. Being a student myself, I see a vast disparity in the skills of my fellow CS majors. What’s more terrifying is how low the low point is.  A freshman I know is currently building a web framework for automatically converting Java object models to visual representations which can be manipulated. Think of it as interactive UML on steroids. It wouldn’t be an exaggeration to say he’s probably the only one in my school who can do that, me included (though our CS department is probably under 75 students). In one of my 300-level classes, a student claimed he didn’t know what a binary representation of an integer is. How can you be in a sophomore CS major without knowing what a binary integer is? And my CS department is one of the more rigorous ones I know of. We use a wide variety of languages and tools, have a separate curriculum for non-majors and the requirements for graduating with honors are pretty strict. The professors routinely give students experience out of class by involving them in research projects.

In some respects, you could say I don’t need to care. I get good grades and I’ve seen and know more than most of fellow students. But it’s not me I’m worried about. Not yet at least. I’d like to make a lasting contribution to computer science at some point in my career, but equally importantly I’d like to work with people who love computer technology as much as I do and are also skilled and dedicated enough to work on something that could be groundbreaking. I’m really afraid that there might be fewer people like that in my age group. Perhaps that’s just because I live in a small school, but if there aren’t going to be any more 18-year olds spinning Scheme abstractions with ease, then my fears seem terribly close to justified.

I’m going to stop now because I feel like I’m screaming at the top of my lungs in an empty room. And I have to call home. I’m willing to believe that the picture isn’t as bleak because I see people like my freshman friend. But it’s takes a good amount of will to keep it up. If you’re in college (or know someone in college) who would like to get together and maybe reverse this trend of approaching mediocrity, please get in touch with me. I could really use a hope infusion right now.

PS. On an equally terrifying note, there’s only one girl in my software engineering class of 27, but that’s not something I’m going try to explain any time soon.

The Hardware Software Interface

One of my Computer Science professors recently lent me the book Computer Organization and Design: The Hardware/Software Design Interface written by two pioneers in the field of computer hardware: David Patterson and John Hennessy. This book is an excellent book about how the computers machinery is actually designed and built written by the people who introduced to the world RISC and MIPS. The book is widely used in undergraduate courses on computer and quite rightly so. I’ve only made my way through the first two chapters, but I already feel that this book deserves a place alongside the classics of computer science.

The reason I’m reading the book is because I’m interested in the hardware aspect of computers just as much as the software side. With so many powerful high-level languages that keep the programmer many layers of abstraction away from the hardware, it’s easy to forget that the computer will only reveal its full power if you learn how it works and learn it well. While it is certainly important to have a strong understanding of how the mathematics of computing work, it is equally important to know how to take those mathematics and concepts (some of which are incredibly powerful) and translate them to patterns of ones and zero. MIT’s Structure and Implementation of Computer Programs is probably the best book ever to be written for the purpose of teaching the fundamentals of pure computer science, covering everything from simple abstractions to machine-code generation. It’s not really a book to read unless what you want is a deep understanding of how computers compute. Combine that with a book like Computer Organization and Design (perhaps its graduate level partner) and you have a combination that if well utilized gives you a very complete understanding of computer systems.

Bridging the Hardware Software Interface is a very special piece of software : the compiler. The compiler is what will take your high-level mathematically abstract program and translate it to the bare bytes and the computer with deal with. You can’t implement a half-decent compiler without understanding well both the computer’s hardware architecture as well as the range of abstractions that you want to implement on that architecture. Steve Yegge correctly argues that you haven’t fully understood computers unless you have understood compilers. Compilers tie together all the fundamental concepts of computer science: programming languages, algorithms, data structures and of course, the hardware software interface. Unfortunately compilers are often glossed over during a computer science education. There has been this break in computer technology education with computer science gradually moving towards a focus on pure software with the bare minimum of hardware and electrical engineering dealing with hardware with a minimum of software (mostly assembly and C). Personally, I think this is a grave mistake, one that leads to a gradual lowering in the quality of computer scientists.

What is the solution? The ideal would be a overhaul of the computer science curriculum, perhaps merging many aspects of computer science and electrical engineering, bridging the hardware-software interface by a proper emphasis on both. However this would also mean that students would have to study a lot more, the curriculum would become quite a good deal harder and there would have to be a move away from having so-called “industry standard” languages such as Java as the main teaching medium. Ideally all computer science courses would have both Computer Organization and Design and SICP as textbooks at some point during the course of the degree. As you can well imagine, this isn’t going to happen any time soon, probably never. I’m going to be studying both of them because I love computers and would never pass up a chance to learn more about them, but I am acutely aware that most of my classmates do not share my enthusiasm.

But what can I do, as a lone college freshman to gain a complete understanding of computer systems, the power they have to offer, the challenges they involve and the many interesting facets of the hardware-software interface? Luckily, I don’t have to bound by the college curriculum (and neither does anyone else). Both of the books I have talked about are written with students in mind (albeit dedicated and determined students) and should be easily available. In fact SICP can be downloaded as PDF or read online for free. The software tools that both of these books use are also freely available online. There are also a number of videos available online of courses conducted by the authors of SICP and I feel that they are an excellent companion to the book. And if that isn’t enough, there is always the World Wide Web, with a plethora of information sites and freely available tools as well many projects to which to contribute to put one’s knowledge to work. Learning should be a proactive activity with just as much enthusiasm shown by the student as the teacher. As the Zen saying goes: “When the student is ready, the teacher shall appear”. This could not be more true in the information age.

Happy Learning !!