We’ll always have native code

A few days ago my friend Greg Earle pointed me to an article with the provocative title of “Hail the return of native code and the resurgence of C++“. While the article provides a decent survey of the state of the popular programming language landscape, it misses the point when it comes to why VM-based or interpreted languages have become popular (and will continue to be so).

The reason why languages running on managed runtimes are and should continue to be popular is not quite as simple as relief from manual memory management or “syntactic comforts” as the author puts it. While both are fair reasons in and of themselves (and personally I think syntax is more important than is generally considered) the benefits of managed runtimes are more far reaching.

As Steve Yegge put it, “virtual machines are obvious“. For starters most virtual machines these days come with just-in-time compilers that generate native machine code instead of running everything through an interpreter. When you have complete control over your language’s environment there are a lot of interesting and powerful things you can do, especially in regards to the JIT. No matter how much you optimize your statically generated machine code you have far more information about how your program is actually running at runtime. All that information allows you to make much smarter decisions about what kind of code to generate.

For example, trace-based virtual machines like Mozilla’s Spidermonkey look for “hotspots” in your code: sections that get run over and over again. It can then generate custom code tuned to be fast for those cases and run that instead of the original correct but naive code. Since JavaScript is a dynamically typed language and allows for very loose, dynamic object creation, naive code can be very inefficient. Thus the runtime profiling is all the more important and can lead to massive speed improvements.

If you’re running a purely native-compiled language you miss all the improvements that you could get if you reacted to what you saw at runtime. If you’re writing an application that is meant to run for long periods of time (say a webserver for a large, high-traffic service) it makes all the more sense to use a VM-based language. There is a lot of information that the VM can use to do optimizations and a lot of time for those optimizations to kick in and have a definite performance benefit. In robust virtual machines with years of development behind them you can get performance that is quite comparable to native code. That is, unless you’re pulling all sorts of dirty tricks to carefully profile and optimize your handwritten native code in which case all bets are off. But honestly, would rather spend your days hand-optimizing native code or writing code that actually does interesting things?

Personally, I hardly think a new C++ standard is going to usher in a widespread revival of native code. C++ has problems of it’s own, not the least of which is that it simply a very big language that very few people fully understand (and adding features and constructs doesn’t help that). A fallout of being a complex language without a runtime is that writing debugging or analysis tools for it is hard. In Coders at Work both Jamie Zawinski and Brendan Eich denounce C++ to some extent, but Eich adds that his particular grievance is the state of debugging tools that have been practically stagnant for the last 20 odd years. Being fast, “native” or having a large feature list is not nearly sufficient to be a good programming language.

All that being said, there is definitely a place for native code. But given the expressive power of modern languages and the performance of their runtimes I expect that niche to keep dwindling. Even Google’s new Go programming language, which is ostensibly for systems programming and generates native code has some distinctly high level features (automated memory management for one). But if you’re building embedded systems or writiing the lowest level of operating systems kernels you probably still want to reach for a fast, dangerous language like C. Even then I wouldn’t be surprised if we see a move towards low-level unmanaged kernels fused with managed runtimes for most code in some of these places. Microsoft’s Singularity project comes to mind.

We’ll always have native code (unless we restart making hardware Lisp machines, or Haskell machiness) but that doesn’t mean that we’re going to (or should) see a widespread revival in low-level programming. I want my languages to keep getting closer to my brain and if that means getting away from the machine, then so be it. I want powerful machinery seamlessly analyzing and transforming source code and generating fast native code. I want to build towers on top of towers. You could argue that there was a time when our industry needed a language like C++. But I would argue that time has passed. And now I shall get back to my Haskell and JavaScript.

It’s a great time to be a language buff

I make no secret of the fact that I have a very strong interest in programming languages. So I was naturally very interested when news of the Go Programming Language hit the intertubes. Go is an interesting language. It pulls together some very powerful features with a familiar, but clean syntax and has lightning fast compile times. It certainly takes a place on my to-learn list along with Haskell and Scala. But even as Go becomes the latest hot piece of language news, it dawned on me that over the past few years we’ve seen a slew of interesting languages offering compelling alternatives to the industry “mainstream”.

I guess it all started with the rise of scripting languages like Python, PHP, Ruby and the poster boy of scripting: Perl. Personally, these languages with their dynamic typing, “batteries included” design and interesting syntax provided a breath of fresh air from the likes of C++ and Java. Not that C++ and Java are necessarily bad languages, but they aren’t the most interesting of modern languages. In the early years of this decade computers were just getting fast enough to write large scale software in scripting languages. Things have changed a lot since then.

Dynamic languages aren’t just reserved for small scripts. Software like Ruby on Rails has proved that you can write really robust back end infrastructure with them. The languages for their part have kept on growing, adding features and making changes that keep them interesting and downright fun to use. Python 3.0 was a brave decision to make a break from backwards compatibility in order to do interesting things and it goes to show that these languages are far from ossifying or degrading.

Then there is JavaScript which was supposed to die a slow death by attrition as web programmers moved to Flash or Silverlight. But we all know that didn’t happen. JavaScript has stayed in the background since the rise of Netscape, but it’s only recently with advances in browser technology and growing standards support that it has really come into its own. I’ve only played with it a little, but it’s a fun little language which makes me feel a lot of the same emotions I felt when discovering Python for the first time. Thanks to efforts like Rhino, you can even use JavaScript on the client side for non-web related programming.

Of course, if you want to do really interesting things with these languages, then performance is not optional. Within the last year or two there’s been a strong push in both academia and industry to find ways to make these languages faster and safer. Google in particular seems to be in the thick of it. Chrome’s V8 JavaScript engine is probably the fastest client side JavaScript environment and their still experimental Unladen Swallow project has already made headway in improving Python performance. V8 has already enabled some amazing projects and I’m waiting to see what Unladen Swallow will do.

While we’re on the topic of performance, mentioning the Java Virtual Machine is  a must. The language itself seems to have fallen from grace lately, but the JVM is home to some of the most powerful compiler technology on the planet. It’s no wonder then that the JVM has become the target for a bunch of interesting languages. There are the ports of popular languages — JRuby, Jython and Rhino. But the more interesting ones are the JVM-centric ones. Scala is really interesting in that it was born of an academic research project but is becoming the strongest contender to Java’s position of premier JVM language. Clojure is another language that I don’t think many people saw coming. It brings the power of LISP to a modern JVM unleashing a wide range of possibilities. It has it’s detractors, but it’s certainly done a fair bit to make Lisp a well known name again.

Academia has always been a hot bed when it comes to language design. It’s produced wonders like Lisp and Prolog and is making waves again with creations like Haskell (whose goal is ostensibly to avoid popularity at all costs) and the ML group of languages. These powerful functional languages with wonderful type inference are a language aficionado’s dream come true in many ways and they still have years of innovation ahead of them.

Almost as a corollary to the theoretically grounded functional languages, systems languages have been getting some love too. D and now Go are both languages that acknowledge that C and C++ have both had their heyday and it’s time to realize that systems programming does not have to be synonymous with bit twiddling. D has gotten some flak recently for not evolving very cleanly over the last few years, but something is better than nothing. Also a real shift towards eliminating manual memory management is a welcome addition.

As someone who intends to seriously study language design and the related concepts in the years to come, it’s a really great time to be in getting involved in learning about languages. At the moment I’m trying to teach myself Common Lisp and I have a Scala book sitting on the shelf too. One fo these days, I plan on sitting down and making a little toy language to get used to the idea of creating a language. Till then, it’s going to be really interesting just watching how things work out in an increasingly multilingual world.

Python Properties vs Java Access Modifiers

This post is part of the Powerful Python series where I talk about features of the Python language that make the programmer’s job easier. The Powerful Python page contains links to more articles as well as a list of future articles.

The question of what exactly comprises object-oriented development is still an open question, but one of the generally accepted parts of the answer is encapsulation. Simply put, encapsulation means that the data structures used by your program should present a simple external interface while having a hidden internal structure. This is generally considered a good thing because it means that users of your structures only need to be concerned with what it they’re supposed to do, not how they do it. They don’t need to bother with what goes on under the hood and equally importantly they can’t just reach in and tinker with your component’s internal structure if you don’t want them to. This in turn gives the developer freedom to change the internal implementation as long as the external interface is maintained.

Java and Python are both considered to be object-oriented languages, at least in the sense that they support creating classes and instantiating objects. They also both support some generally accepted features of orientation like inheritance, polymorphism and encapsulation. There are some differences as to how these features are supported though. In this article I’ll be focusing on the different approaches they take towards encapsulation. Let’s start with Java.

Java allows you to apply encapsulation at the level of individual class members. The programmer gets to see what external software sees each field or method by using one of the following access modifiers (also called access specifiers):

  • no modifier — by default all members in a class are visible within the rest of the class and to other classes in the same package
  • private — only the rest of the class can access a private member. It can’t be used by anything outside.
  • protected — These are visible within the same class, within the same package and to all subclasses
  • public — These are visible to everyone

Classes themselves can be either public (visible to everyone) or have no modifier, showing that they are only visible in the same package. This is a straightforward way to implement encapsulation and it works quite well too. However, there are some drawbacks. Java’s brand of encapsulation (which is shared to some extent by C++) recommends that all fields be set to private. Any fields that need to be accessed from or modified by external software needs to have getters and setters. Here’s an example with a simple class that has 2 integer fields and a method that returns their sum:

class Simple
{
  private int x,y, sum;
  public void setX(int tmpx)
  {
    x = tmpx;
  }
  public void setY(int tmpy)
  {
    y = tmpy;
  }
 public int getSum()
  {
    sum = x + y;
    return sum;
  }
}

Java’s recommended strategy makes perfect sense. If you decide that you want to limit the integers to positive only, you just change the set methods and you’re done. Exposing the fields directly as public fields would have meant that any client code would have to then change when you rewrote your class. However, the pre-emptive encapsulation strategy that Java suggests does lead to verbose code and you can quickly become overwhelmed with getters and setters. And if you don’t do input validation on most of them, you’re left with a lot of code that’s sitting around doing practically nothing.

Python’s approach to encapsulation is slightly different. There are only public or private attributes.  Anything that starts with (but doesn’t end with) two underscores is private to the class (or module). Everything else is public. However, Python doesn’t really enforce encapsulation strongly like Java. Rather it does something called name mangling. Essentially the names of all private attributes are hidden by internally adding the class name in front of them. For example attribute __attrib in class cls would internally become _cls__attrib. When someone tries to access __attrib externally, an error is raised saying that __attrib doesn’t exist. Programmers are expected to be polite and not barge into other people’s classes if they’re not supposed.

With this knowledge, it would be entirely possible to write the above Java code in Python in almost exactly the same way, but that would miss one of Python’s great encapsulation features: properties. Properties allow you to have class attributes that are accessed like simple attributes (fields or instance variables in other object-oriented languages) but are actually implemented by the class’s methods. Let’s rewrite the above code in Python using properties:

class Simple(object):
	def __init__():
		self.__x = 0
		self.__y = 0
		self.__sum = 0
	def setX(self, t, tmpx):
		self.__x = tmpx
	def setY(self, tmpy):
		self.__y = tmpy
	def getSum(self):
		self.__sum = self.__x + self.__y
	x = property(None, setX)
	y = property(None, setY)
	sum = property(getSum)

It may a bit hard to see why Python properties are so nice with this example, but there are a few things that are still pretty clear. Firstly, the user of your class just needs to know the name of a variable to set or get. They don’t need to deal with your naming conventions for setters or getters. This in turn lets you steer development of your class with finer accuracy.

Unlike Java, you don’t need to lock down your fields by declaring them private and then writing getters and setters which really do nothing. You can start by having everything be wide open and public and save a lot of boilerplate. Then when the need arises to make something closed, you can implement that as a method and leave everything else unchanged. Through it all users are blissfully unaware of what’s going on. The class becomes a smart store-house of data which they can get just by reaching for it. To see a larger example of how setters are useful, see this really informative post.

Over the past years I’ve learned to be careful when comparing languages, but in this case, I can say definitely that Python wins. To recap, here’s why:

  1. No boilerplate necessary. Compact code is mostly a good thing. Programs can easily be huge and complex, adding tons of getters and setters just in case you might someday need them doesn’t help.
  2. Better encapsulation (IMO). The way I see it, encapsulation is all about communicating with the user on a need to know basis. Why should the user need to know implementation details like whether you have a getter and setter and if your class attribute is public or private? Give them a single name by which they can access and alter a specific piece of data. Your API  documentation should say whether something is available to read/write or not.
  3. Properties let you put off the private/public decision. The fewer decisions you make, the less chance of having to change them later.

One note to remember is that for your classes to use properties, they have to inherit from object. I hope this article has helped you understand one of Python’s really useful but not-as-often used features. It can be hard to get used to a snappy, fluid language like Python coming from bulky Java or C++, but using language features like properties is a great way to start.

Scala as an alternative to Java: Part 1

The end of the semester is drawing near and that means that my independent study in programming languages is also coming to its end. This course has helped me learn a lot about programming languages. Before it I was mildly interested in programming languages, but over the course of the semester I’ve been introduced to many new and innovative ideas and I’ve become convinced that studying programming languages is something I want to continue.

At the same time, I’ve also become interested in virtual languages and the benefits they offer. Over the past few years the Java virtual machine has grown in prominence and there has been a corresponding rise in the number of different languages that can run on the JVM. I’ve never been a big fan of Java, though I will admit that it does have some advantages. With about a month left till I have to go home, I think it’s time to look seriously into an alternative for Java on the JVM. Though there are a number of interesting candidates (Scala, Groovy, Fortress, JRuby, Clojure) the one I’m most interested in looking into is Scala. I’m only just starting to use it, but here are some features of the languages that I’m looking forward to using:

1. Uniform object and static type system

One of my biggest gripes about Java is that it isn’t purely object-oriented. Java is a mix of hierarchical class systems, primitive types (ints, floats) and reference types. Though this isn’t always a program, it can be messy and there can be times when you would really like to subclass one of the primitive types, but you simply can’t. Java is statically typed, but again, not quite. Though type-casting can be quite handy sometimes, I can’t help feeling that it’s somehow ‘unclean’ and not very aesthetically pleasing. And it’s not very safe either. In Scala everything is an object. For examples integers are instances of scala.Int. That lets you subclass base types. It also has strong static typing, as well as type inference. I’m still not very sure about the effectiveness of static typing, but I’ve come to respect it and I’m interested to see how it works.

2. First class functions, anonymous functions and singleton objects

Through my contact with Scheme, Lisp and ML, I’ve come to love functional programming. Some languages like Python do a good job intersecting object oriented and functional programming styles. I’m interested to see how I can adapt my functional programming habits to work in a object-oriented, strongly typed system like Scala. The ability to create Singleton objects without having to go through the hassle of building a container class and instantiating it is something I missed more than once in Java that will lead to some more productive code. As a price, I’ll have to give up Java’s static fields and methods, but I think that should be an acceptable price.

3. Traits and Multiple Inheritance

Java doesn’t support multiple inheritance and I somewhat support the reasons why. However there are workarounds, but none of them have looked very appealing to me. I’ve never been in a position where I really needed multiple inheritance, but I can understand the situations where it would be helpful. Scala allows reusing code from multiple classes using traits which are similar to Java’s interfaces, but can contain code and not just definitions. There are some rules to be followed, but it looks like something that could be a powerful tool

4. Method Overriding

Method overriding is something that is sorely missed by a lot of Java programmers and the lack of it contributes to Java’s verbosity. The concept is simple: if you have a Real numbers class, you can redefine the ‘+’ operator do real addition if overloading is supported. But in Java, you would have to create a class method, say add() which would handle addition. I’ve become a supporter of the idea that things which do you the same thing should look the same (conversely, different things should look different), and overloading is a step in the right direction.

In the near future…

I’m going to be exploring more about Scala. Each of the above is worth an entire post in itself and I’m sure there will be more things I’ll discover on the way. I’m going to be adding to this series at regular intervals so stay tuned.

How many programming languages should I learn?

OSNews has started a series called A-Z of programming languages where they’ve been posting interviews with the creators of well-known programming languages. Till now they’ve done AWK, Ada and BASH. Of those three the only one I’ve had any experience with is BASH, and not too much of that. But considering that there are literally hundreds of programming languages (and many more dialects or implementations) which ones should one learn to be a good programmer?

I know that many programmers out there simply learn just one or two languages and then use them throughout their careers (or at least until it becomes impossible to find a job). Certainly that works, to some point at least and so the question is, do we really need to learn languages that are not the “industry standard” (i.e. whatever has the most jobs on offer). If all you’re interested in is a job, then no, you don’t. One or two languages will probably be enough. However, if you want to keep learning and keep developing as a programmer, then the answer is most certainly yes. Some programming languages are quite similar in terms of syntax and power, but some are very different and teach you think in different ways. It’s these different languages that are going to make you better as a programmer.

So we come back to our initial question: How many languages to learn and perhaps more importantly which ones? I think that there are two types of languages that are worth learning: Those that make you think differently and those that have been used to write a large amount of high quality code. Languages like Smalltalk and LISP and to some extent Java are strongly paradigm-oriented, they emphasize a specific style of programming, in the case of Smalltalk it’s object-oriented and in the case of LISP and it’s derivatives, it’s pure functional. Such languages will teach you important lessons which you can apply even when you’re using some other languages.

One of the languages that has been widely embraced by the hacker community is C. There is a incredible amount of really good code written in C, the most famous of which is probably the Linux kernel. Unless you’re a systems programmer, you probably won’t have to use C or C++ much, but you can benefit a lot from reading well written code. C and C++ can be used to write powerful code, but sometimes the power doesn’t quite justify taking the trouble of all the low-level work that you have to do. In that case, it’s good to have a general purpose high-level language lying around. I would strongly recommend Python, but you might find another language easier for everyday use. But once you have made a choice, learn it well and use it to it’s maximum.

It might be worthwhile learning a language that has powerful text processing abilities, like AWK or Perl. It might make your work easier if you know one or both. And there is an awful lot of code written in Perl for the purpose of gluing larger programs together, so it might be worthwhile to learn it. However, Perl has been falling from grace for a good few years and many people are now using Python and Ruby to the same things they used Perl for. I don’t have a concrete opinion of Perl at the moment, but I think it’s something you can put of learning until you have an actual need for it.

You should also learn Java. I personally consider Java to be a decent language, but not a good one and I probably wouldn’t use it if I had a choice. However, it is a very popular one and if you get a job as a programmer, chances are you’ll encounter a substantial amount of Java code which you have to deal with. And you won’t be much of a programmer if you can’t deal with other people’s code. So learn Java.

Knowing basic HTML or CSS is also a good idea. You might not be a full fledged web artist, but you should be able to throw together a decent web page without much trouble. Considering the growing importance of the web, learning a web programming language is becoming important. It’s not quite a necessity yet, but I think in less than 5 years it will be. I can’t recommend one now, because I have no experience, but I think that Ruby might be a good idea, because it’s a decent general purpose language as well.

I should say, that I don’t know all of the above, but I have had some experience with each of them. I think each of them have contributed to making me a better programmer, and that the more I delve into them and use them for harder problems, I will continue to improve. In conclusion, I would like to say that if you are committed enough to learn multiple languages well, you should also invest some time in learning a powerful text editor such as Vi or Emacs. Though you can certainly write great code with nothing more than Notepad, using a more powerful tool can make your job quite a bit easier (and considerably faster). Once you turn fine-tuning these editors to suit your style and habits, you won’t want to use anything else. If you’re seriously out to become the best programmer you can be, you’re going to want the best possible tools at your disposal.

I’ll be happy to here your comments on what programming languages you might recommend to anyone looking to improve their programming.