Inception and abstraction

Inception

I watched Inception with friends last Saturday. I really enjoyed it and thought it was really well made, though it’s certainly a complex movie which you need to pay attention to. Considering that most of the readers of The ByteBaker are computer savvy (and probably programmers) you’re going to like it (or hate it) very much because it touches on some of our core concepts: recursion, closures and abstraction. In this way, it’s not all that much different from the Matrix — different premise and plotline but a very similar feel.

Without dropping any spoilers here’s what you need to know about the movie for the rest of the article: it’s about people who go into other people’s dreams in order to steal their secrets. Pretty simple, right? The kicker is that it’s possible to dream inside another dream leading to all sorts of interesting situations and plot twists. Ok, so that’s not quite recursion since that would mean that the subject would be dreaming the same dream inside the dream (confused yet?). Come to think of it the dreams of Inception are more like closures.

So what are closures? Wikipedia tells us that:

In computer science, a closure is a first-class function with free variables that are bound in the lexical environment.

Perfectly understandable, right? No? Ok let’s translate. First and foremost, a closure is a function. But it’s not just any old run-of-the-mill function. A closure generally contains variables that are neither local variables nor arguments to that function. So what do those variables refer to? Their values come from outside the function, specifically the code block that surrounds the function. The wikipedia page on closures gives examples in a number of languages. Though closures aren’t necessarily a part of a standard curriculum they are extremely powerful constructs that can be used to implement a host of other programming language features (including control flow structures and object systems). Coming back to Inception, once you are inside a dream you can recall the world outside (though the real world seems like a dream so everything’s a bit fuzzy).

Closures in computer science (and dreams in Inception) are important because they are a prime example of abstraction. Functions are a powerful concept because they essentially let you create little worlds in which you can do stuff. You put something in a function and get something out. You don’t need to know or care about what’s going on inside the function (unless something goes wrong, but that’s a different matter entirely). Functions let you abstract away processes. Closures improve upon functions and let you abstract state. If you’re using a function that’s a closure, you don’t need to know about what it’s variables are bound to (except the ones you pass in) and you can’t see what data the closure can manipulate either.

By tucking away state, closures give us less to hold in our minds and make it easier to write code that’s clean and follows the Single Responsibility Principle (essentially, do one thing and do it well). Suppose you have a bunch of closures inside one larger function. Now magically you have sections of executable code that all operates on the same data and yet can do completely different things. They can also do all this without having to passing in a host of arguments every time (which reduces the chance for making mistakes). Whenever the closures need something, they just refer to their outer environment. Sound familiar? It should because I just described objects and methods. And that is the hallmark of a good abstraction — it lets you build up other abstractions on top.

Abstractions are in general a good thing. But unless you think through your abstractions, they can be bad. A leaky abstraction is one that doesn’t quite get it right. The underlying layers somehow “leak through” what should be the abstraction’s water tight boundaries. Joel Spolsky has a very good article on leaky abstraction that’s a must read if you want to learn more. And while we’re on the topic of abstraction — too much can be a bad thing. I wrote a Python program two summers ago to experiment with L-systems. Last summer I tried rewriting it such that everything in the system was the instance of some class. Everything was supposed to go through methods and abstraction boundaries. I never finished. This summer I toned it down a little and got a working version in about a week. Yes, this is classic second system effect, but it also shows that sometimes abstractions will just get in the way and force you to jump through hoops.

In conclusion: abstractions are good if used wisely. Closures are one such powerful abstraction. Dream safe.

Python Properties vs Java Access Modifiers

This post is part of the Powerful Python series where I talk about features of the Python language that make the programmer’s job easier. The Powerful Python page contains links to more articles as well as a list of future articles.

The question of what exactly comprises object-oriented development is still an open question, but one of the generally accepted parts of the answer is encapsulation. Simply put, encapsulation means that the data structures used by your program should present a simple external interface while having a hidden internal structure. This is generally considered a good thing because it means that users of your structures only need to be concerned with what it they’re supposed to do, not how they do it. They don’t need to bother with what goes on under the hood and equally importantly they can’t just reach in and tinker with your component’s internal structure if you don’t want them to. This in turn gives the developer freedom to change the internal implementation as long as the external interface is maintained.

Java and Python are both considered to be object-oriented languages, at least in the sense that they support creating classes and instantiating objects. They also both support some generally accepted features of orientation like inheritance, polymorphism and encapsulation. There are some differences as to how these features are supported though. In this article I’ll be focusing on the different approaches they take towards encapsulation. Let’s start with Java.

Java allows you to apply encapsulation at the level of individual class members. The programmer gets to see what external software sees each field or method by using one of the following access modifiers (also called access specifiers):

  • no modifier — by default all members in a class are visible within the rest of the class and to other classes in the same package
  • private – only the rest of the class can access a private member. It can’t be used by anything outside.
  • protected — These are visible within the same class, within the same package and to all subclasses
  • public — These are visible to everyone

Classes themselves can be either public (visible to everyone) or have no modifier, showing that they are only visible in the same package. This is a straightforward way to implement encapsulation and it works quite well too. However, there are some drawbacks. Java’s brand of encapsulation (which is shared to some extent by C++) recommends that all fields be set to private. Any fields that need to be accessed from or modified by external software needs to have getters and setters. Here’s an example with a simple class that has 2 integer fields and a method that returns their sum:

class Simple
{
  private int x,y, sum;
  public void setX(int tmpx)
  {
    x = tmpx;
  }
  public void setY(int tmpy)
  {
    y = tmpy;
  }
 public int getSum()
  {
    sum = x + y;
    return sum;
  }
}

Java’s recommended strategy makes perfect sense. If you decide that you want to limit the integers to positive only, you just change the set methods and you’re done. Exposing the fields directly as public fields would have meant that any client code would have to then change when you rewrote your class. However, the pre-emptive encapsulation strategy that Java suggests does lead to verbose code and you can quickly become overwhelmed with getters and setters. And if you don’t do input validation on most of them, you’re left with a lot of code that’s sitting around doing practically nothing.

Python’s approach to encapsulation is slightly different. There are only public or private attributes.  Anything that starts with (but doesn’t end with) two underscores is private to the class (or module). Everything else is public. However, Python doesn’t really enforce encapsulation strongly like Java. Rather it does something called name mangling. Essentially the names of all private attributes are hidden by internally adding the class name in front of them. For example attribute __attrib in class cls would internally become _cls__attrib. When someone tries to access __attrib externally, an error is raised saying that __attrib doesn’t exist. Programmers are expected to be polite and not barge into other people’s classes if they’re not supposed.

With this knowledge, it would be entirely possible to write the above Java code in Python in almost exactly the same way, but that would miss one of Python’s great encapsulation features: properties. Properties allow you to have class attributes that are accessed like simple attributes (fields or instance variables in other object-oriented languages) but are actually implemented by the class’s methods. Let’s rewrite the above code in Python using properties:

class Simple(object):
	def __init__():
		self.__x = 0
		self.__y = 0
		self.__sum = 0
	def setX(self, t, tmpx):
		self.__x = tmpx
	def setY(self, tmpy):
		self.__y = tmpy
	def getSum(self):
		self.__sum = self.__x + self.__y
	x = property(None, setX)
	y = property(None, setY)
	sum = property(getSum)

It may a bit hard to see why Python properties are so nice with this example, but there are a few things that are still pretty clear. Firstly, the user of your class just needs to know the name of a variable to set or get. They don’t need to deal with your naming conventions for setters or getters. This in turn lets you steer development of your class with finer accuracy.

Unlike Java, you don’t need to lock down your fields by declaring them private and then writing getters and setters which really do nothing. You can start by having everything be wide open and public and save a lot of boilerplate. Then when the need arises to make something closed, you can implement that as a method and leave everything else unchanged. Through it all users are blissfully unaware of what’s going on. The class becomes a smart store-house of data which they can get just by reaching for it. To see a larger example of how setters are useful, see this really informative post.

Over the past years I’ve learned to be careful when comparing languages, but in this case, I can say definitely that Python wins. To recap, here’s why:

  1. No boilerplate necessary. Compact code is mostly a good thing. Programs can easily be huge and complex, adding tons of getters and setters just in case you might someday need them doesn’t help.
  2. Better encapsulation (IMO). The way I see it, encapsulation is all about communicating with the user on a need to know basis. Why should the user need to know implementation details like whether you have a getter and setter and if your class attribute is public or private? Give them a single name by which they can access and alter a specific piece of data. Your API  documentation should say whether something is available to read/write or not.
  3. Properties let you put off the private/public decision. The fewer decisions you make, the less chance of having to change them later.

One note to remember is that for your classes to use properties, they have to inherit from object. I hope this article has helped you understand one of Python’s really useful but not-as-often used features. It can be hard to get used to a snappy, fluid language like Python coming from bulky Java or C++, but using language features like properties is a great way to start.