Books for intermediate computer science students

So you’ve survived the first few programming courses, you know why bubble sort’s a bad idea and you can write tree traversal algorithms in your sleep. The question that’s eating you as the new year looms is: what’s next? There are a proliferation of books out there meant for beginning programmers and quite a few for the experts, but in my experience there are fewer resources for people who are half-way up the ladder and want to know how to climb the next few rungs. Luckily there are a number of books out there that a sufficiently motivated student can use to learn more advanced techniques. 

Before I jump in, it’s worth taking some time to clarify what exactly intermediate is. For starters, I’m assuming that you know at least 3-4 different programming languages including at least one object-oriented, one functional and one utility language (eg Perl/Python/Ruby). I’m also assuming that you’ve worked in teams and have successfully accomplished at least one medium scale programming project which involved both program design and implementation. And since computer science is more than just programming, you should have at least a cursory knowledge of computing theory and algorithms, meaning that you know, at the least, what a Turing Machine is and what Big-O notation means. That being said, let’s get on with it:

Structure and Implementation of Computer Programs

Yes, I know MIT uses this book for its intro-level programming course, but it quite clearly states in one of the prefaces that most students taking that course already have some programming experience. It’s certainly a fine book, but I do think that novice programmers won’t quite understand the full power of some of the concepts presented. However, having spent some time in the trenches, you’re likely to have a better understand of what sort of things can make your life easier. This book will teach you a lot about abstractions and algorithms and once you’ve experienced the austere simplicity and inherent power of Scheme, you’ll never think the same way again.

On Lisp

Even though the book is supposed to about “Advanced Techniques for Common Lisp”, the first few chapters are devoted to a rather basic review of functional programming concepts. I’m currently working my way through this book and I think it’s fair game for anyone who has had exposure to functional programming in general and some dialect of Lisp in particular. Be warned however, that this book is quite topic specific. It makes no secret of the fact that many of the ideas that are explored will be useless, or at least very hard to implement without the powerful framework that Lisp gives you. Unless you plan on actually using Lisp for a substantial amount of your work in the future, this book might not be worth the time investment.

Design Patterns

At the other end of the generality spectrum is this classic. It’s authors have gained somewhat legendary status in the software engineering community as the Gang of Four and the book very well deserves its reputation as a must-read for serious software engineers. If you’re bulding production software and use any form of object orientation, you’ll soon be encountering some of the patterns described here. The books purpose is to describe general patterns that continually occur in software. It reads more like a catalog or cookbook than a standard textbook, but that adds to its appeal. Each pattern can be studied more or else separately and references to supporting patterns are made clear. There is also a considerable amount of sample code in Smalltalk and C++. This isn’t exactly a book that’s made to be read cover to cover and you’re likely to get bored unless you have a problem in mind that you’re looking to solve. The best way to get the most out of this book is to treat is as a reference and another tool in your toolkit.

Code Complete 2

Another general software engineering book that will make you a better programmer for the rest of your days. Code Complete is hard to describe in a few words and the best way to get a feel for what it is to read the first few chapters. Code Complete focusses on the actual writing of code. It deals with issues that arise was you and your team sit down to actually implement the design. If you have any experience at all working with a team on a large project, you’ll understand a lot of the issues that are dealt with here. This book isn’t about fancy optimizations or agile team management techniques, its about actually sitting down and writing your code. A must-read for anyone who wants to build better software (which I hope, is everyone who has ever written software).

Compilers: Principles, Techniques and Tools

You may only have a passing interest in compilers and may not be looking to become an expert in programming languages, but learning about compiler technology will help you in numerous other ways as well. A word of warning, this book is a classic but it also edges into the ‘advanced’ domain. I’ve started reading this book in earnest since I became interested in programming languages and I still have a long way to go. This book will force you to develop along multiple fronts: algorithms, data structure, even interface design (by way of designing usable languages). And as Steve Yegge pointed out, starting to write a compiler is not for the faint of heart, because there is no end to it (though that’s something for another post). Read this book only if you are seriously committed to someday becoming an expert programmer.

The Mythical Man Month

It’s a bit dated and a lot of the examples might seem archaic, but the underlying message is clear even after all these years. This book is must read if you’re ever in a position to lead a team ( and chances are, at some point you will be). If there is any book that I would say is a must for a software engineering curriculum, this is it. It can be tempting to skip over the more technical parts, but please don’t do so. Read this book cover to cover and then read it again a few weeks later.

 

The books I’ve listed above won’t make you an expert programmer, but they will start you on the way and take you a fair distance along. The truth is that it’s much harder to go from intermediate to expert than to go from beginner to intermediate (though I suppose that’s true of all disciplines). I’m looking for a good book on algorithms to add to the above list, but I haven’t come across one that is at the proper level yet. I consider Knuth’s masterpiece to be at a somewhat higher level than what I’m prepared for at the moment. Any suggestions would be welcome.

Advertisements

Semester end wrap up

Finals week starts from tomorrow and that means another semester has come to an end. I’ve done a number of interesting things this semester, and I think it would be a good idea to tie them together at the end.In many ways this semester has been a continuation of things I started earlier and a starting point for new things. I cna’t say that I wrapped up many things, but I don’t think that’s a bad thing. In fact it’s probably a sign that I have things to keep me busy and interested. So here goes with the end of semester wrap up:

1. My research work

Over the summer I started doing work with a professor on using formal grammars to understand pattern formation. Summer was very much a ‘testing-the-waters’ phase. We worked on two separate projects: automated urban design and auto-generated art in the hopes of being able to find some underlying rules for creating patterns. Though we produced some good work (and a number of good looking graphics), there was still a lot to be done. This semester we decided to drop the urban design aspect and instead refocus on a more basic topic: representing general patterns as grammar rules for artificial languages. To do this I’ve been working on building a simple language to describe languages and then generating samples of those languages, which can then be visually displayed. I hope to have a working version ready by the end of January. This project has been interesting because I got to learn a lot about language design and it was my first time writing a lage piece of software as part of a team. It also fit in very well with my programming languages course, speaking of which…

2. Programming Languages Course

I was supposed to take an algorithms course, but due to a scheduling conflict ended up taking a programming languages course as an independent study instead. I’ve always had an interest in programming languages and working with formal grammars over the summer furthered that interest. Taking the course gave me a chance to formally study about my interests and introduced me to a number of new and interesting ideas.  I learned about a number of new languages on the way, including ML, Prolog and Smalltalk. The course also helped to clarify some earlier ideas I had about what constituted good language design. I realized that syntax can be very impotant and that a lot of times appreciating the full power of a language involves a steep learning curve, but it can be well worth it. And this course led me to one of my best discoveries of the semester, namely…

3. Scala

I’ve liked the Java platform, but never had much love for the language. However, I recently discovered Scala, a really nice statically-typed, multiparadigm language. It feels a lot like Java, but is much cleaner. Its features of note include a good merging of object-oriented and functional paradigms, a clean way to reuse code without the problems of multiple inheritance, static typing and good interaction with existing Java code. I’ve started a series evaluating Scala as an alternative to Java. I’m planning on using more Scala next semester, so we’ll see how that goes.

4. Beginning my electrical and computer engineering major

I’m pursuing an electrical and computer engineering major at college and this semester was the first course in the sequence: a basic digital circuits course. I enjoyed this course, learned a lot and made some good friends. I like dealing with low level logic gates and pushing bits around as much as I do slinging functions all over the place. I’m taking the second course in the sequence next semester alongside a computer organization course. I’m looking forward to having another exciting semester coming up.

5. Web design work

I’ve been making websites for a good few years now, but I’ve only started takin it seriously this semester. I’ve started to design websites for a number of student groups on campus as well as preparing templates for ePortfolios for foreign language students at our college. I started using Dreamweaver and despite my normal scepticism for WYSIWYG tools, I quite enjoy using it and it beats manually managing links. Most of my work isn’t online yet, but it should be within the next few months  (once students start using it). This is certainly I’ll be pursuing and considering that I plan to move to paid hosting early next year, I might just end up making my own WordPress theme.

6. A new focus on blogging

I might have put this last on the list, but it’s certainly not the last thing on my mind. I’ve realized that if you’re doing things that are fun and interesting, there are probably a lot of people out there who would like to know about it. I think being able to communicate your work is as important as actually producing it. I’m looking forward to many more semesters of fruitful blogging ahead.

That basically wraps it up for things I’ve done this semester. There’s a bunch of things that I have planned for next year, but I think I’ll save that post until the start of next year.

Variables are not boxes

One of the common metaphors taught in intro computer science courses is that a variable is a similar to a box. Just as you can use a box to store something, you can use a variable to store a value. This metaphor is useful for introducing the concept of variables, but it is a very flawed metaphor. What makes it particularly bad is that holding onto this metaphor can be limiting when a student starts to go even slightly beyond the beginner stage.

In the good old days of down-to-the-hardware procedural programming, a variable really was a box: it was in essence a memory location. The fact is, those days are mostly beyond us, unless you use C/C++ or Assembler as your main programming language.  In fact, it is much more likely that you are using some sort of object-oriented language as your main language. With object orientation, the box metaphor doesn’t work. Why? Because of references.

With an object-oriented language, a variable is actually a reference. The object that you seem to be storing in a variable doesn’t actually get put into your variable box. The variable just ‘refers’ to the object. You can get to it through the variable, but it isn’t stored there. Consider the following (in Python):

foo = SomeClass()
bar = foo

Thinking of variables as boxes leads to confusion: how do you put one thing into two different boxes? Or does it just get magically duplicated? Add to that the fact that you’ll see that whatever you do to foo also gets done to bar.

A far more appropriate metaphor in this case is to think of foo and bar as names for the new SomeClass object. The new object is created and you choose to use it by the name foo. But then you decide also want to call it bar. How do you that? Well the only way to get it is through the name foo. Thus ‘bar = foo’ says ‘find whatever object has the name foo and then give it the name bar as well’. After that, giving either bar or foo another value, means that something else has the name foo or bar.

Though this metaphor does seem like a better one, it’s important to realize that the metaphor is deeply involved with the concepts of object orientation and doesn’t work perfectly. Consider the following:

foo = 1
bar = foo
bar += 1

The box metaphor fails because even though integers in Python are objects, bar and foo won’t refer to the same object. When dealing with simple integers like this, it’s actually the box metaphor that works. bar = foo copies whatever was in box foo and puts it in box bar.

So the question is what are variables? I would argue that the correct metaphor is context-sensitive. If you’re in a object-oriented environment, thinking of them as just names for independent objects will make your life much easier (and help prevent bugs arising from incorrect use of reference properties). But when you’re doing simple number crunching, the box does the job. But before you pick the metaphor you use, invest some time in learning about your language/framework and what metaphor suits is best.

The Hardware Software Interface

One of my Computer Science professors recently lent me the book Computer Organization and Design: The Hardware/Software Design Interface written by two pioneers in the field of computer hardware: David Patterson and John Hennessy. This book is an excellent book about how the computers machinery is actually designed and built written by the people who introduced to the world RISC and MIPS. The book is widely used in undergraduate courses on computer and quite rightly so. I’ve only made my way through the first two chapters, but I already feel that this book deserves a place alongside the classics of computer science.

The reason I’m reading the book is because I’m interested in the hardware aspect of computers just as much as the software side. With so many powerful high-level languages that keep the programmer many layers of abstraction away from the hardware, it’s easy to forget that the computer will only reveal its full power if you learn how it works and learn it well. While it is certainly important to have a strong understanding of how the mathematics of computing work, it is equally important to know how to take those mathematics and concepts (some of which are incredibly powerful) and translate them to patterns of ones and zero. MIT’s Structure and Implementation of Computer Programs is probably the best book ever to be written for the purpose of teaching the fundamentals of pure computer science, covering everything from simple abstractions to machine-code generation. It’s not really a book to read unless what you want is a deep understanding of how computers compute. Combine that with a book like Computer Organization and Design (perhaps its graduate level partner) and you have a combination that if well utilized gives you a very complete understanding of computer systems.

Bridging the Hardware Software Interface is a very special piece of software : the compiler. The compiler is what will take your high-level mathematically abstract program and translate it to the bare bytes and the computer with deal with. You can’t implement a half-decent compiler without understanding well both the computer’s hardware architecture as well as the range of abstractions that you want to implement on that architecture. Steve Yegge correctly argues that you haven’t fully understood computers unless you have understood compilers. Compilers tie together all the fundamental concepts of computer science: programming languages, algorithms, data structures and of course, the hardware software interface. Unfortunately compilers are often glossed over during a computer science education. There has been this break in computer technology education with computer science gradually moving towards a focus on pure software with the bare minimum of hardware and electrical engineering dealing with hardware with a minimum of software (mostly assembly and C). Personally, I think this is a grave mistake, one that leads to a gradual lowering in the quality of computer scientists.

What is the solution? The ideal would be a overhaul of the computer science curriculum, perhaps merging many aspects of computer science and electrical engineering, bridging the hardware-software interface by a proper emphasis on both. However this would also mean that students would have to study a lot more, the curriculum would become quite a good deal harder and there would have to be a move away from having so-called “industry standard” languages such as Java as the main teaching medium. Ideally all computer science courses would have both Computer Organization and Design and SICP as textbooks at some point during the course of the degree. As you can well imagine, this isn’t going to happen any time soon, probably never. I’m going to be studying both of them because I love computers and would never pass up a chance to learn more about them, but I am acutely aware that most of my classmates do not share my enthusiasm.

But what can I do, as a lone college freshman to gain a complete understanding of computer systems, the power they have to offer, the challenges they involve and the many interesting facets of the hardware-software interface? Luckily, I don’t have to bound by the college curriculum (and neither does anyone else). Both of the books I have talked about are written with students in mind (albeit dedicated and determined students) and should be easily available. In fact SICP can be downloaded as PDF or read online for free. The software tools that both of these books use are also freely available online. There are also a number of videos available online of courses conducted by the authors of SICP and I feel that they are an excellent companion to the book. And if that isn’t enough, there is always the World Wide Web, with a plethora of information sites and freely available tools as well many projects to which to contribute to put one’s knowledge to work. Learning should be a proactive activity with just as much enthusiasm shown by the student as the teacher. As the Zen saying goes: “When the student is ready, the teacher shall appear”. This could not be more true in the information age.

Happy Learning !!