Python Properties vs Java Access Modifiers

This post is part of the Powerful Python series where I talk about features of the Python language that make the programmer’s job easier. The Powerful Python page contains links to more articles as well as a list of future articles.

The question of what exactly comprises object-oriented development is still an open question, but one of the generally accepted parts of the answer is encapsulation. Simply put, encapsulation means that the data structures used by your program should present a simple external interface while having a hidden internal structure. This is generally considered a good thing because it means that users of your structures only need to be concerned with what it they’re supposed to do, not how they do it. They don’t need to bother with what goes on under the hood and equally importantly they can’t just reach in and tinker with your component’s internal structure if you don’t want them to. This in turn gives the developer freedom to change the internal implementation as long as the external interface is maintained.

Java and Python are both considered to be object-oriented languages, at least in the sense that they support creating classes and instantiating objects. They also both support some generally accepted features of orientation like inheritance, polymorphism and encapsulation. There are some differences as to how these features are supported though. In this article I’ll be focusing on the different approaches they take towards encapsulation. Let’s start with Java.

Java allows you to apply encapsulation at the level of individual class members. The programmer gets to see what external software sees each field or method by using one of the following access modifiers (also called access specifiers):

  • no modifier — by default all members in a class are visible within the rest of the class and to other classes in the same package
  • private — only the rest of the class can access a private member. It can’t be used by anything outside.
  • protected — These are visible within the same class, within the same package and to all subclasses
  • public — These are visible to everyone

Classes themselves can be either public (visible to everyone) or have no modifier, showing that they are only visible in the same package. This is a straightforward way to implement encapsulation and it works quite well too. However, there are some drawbacks. Java’s brand of encapsulation (which is shared to some extent by C++) recommends that all fields be set to private. Any fields that need to be accessed from or modified by external software needs to have getters and setters. Here’s an example with a simple class that has 2 integer fields and a method that returns their sum:

class Simple
{
  private int x,y, sum;
  public void setX(int tmpx)
  {
    x = tmpx;
  }
  public void setY(int tmpy)
  {
    y = tmpy;
  }
 public int getSum()
  {
    sum = x + y;
    return sum;
  }
}

Java’s recommended strategy makes perfect sense. If you decide that you want to limit the integers to positive only, you just change the set methods and you’re done. Exposing the fields directly as public fields would have meant that any client code would have to then change when you rewrote your class. However, the pre-emptive encapsulation strategy that Java suggests does lead to verbose code and you can quickly become overwhelmed with getters and setters. And if you don’t do input validation on most of them, you’re left with a lot of code that’s sitting around doing practically nothing.

Python’s approach to encapsulation is slightly different. There are only public or private attributes.  Anything that starts with (but doesn’t end with) two underscores is private to the class (or module). Everything else is public. However, Python doesn’t really enforce encapsulation strongly like Java. Rather it does something called name mangling. Essentially the names of all private attributes are hidden by internally adding the class name in front of them. For example attribute __attrib in class cls would internally become _cls__attrib. When someone tries to access __attrib externally, an error is raised saying that __attrib doesn’t exist. Programmers are expected to be polite and not barge into other people’s classes if they’re not supposed.

With this knowledge, it would be entirely possible to write the above Java code in Python in almost exactly the same way, but that would miss one of Python’s great encapsulation features: properties. Properties allow you to have class attributes that are accessed like simple attributes (fields or instance variables in other object-oriented languages) but are actually implemented by the class’s methods. Let’s rewrite the above code in Python using properties:

class Simple(object):
	def __init__():
		self.__x = 0
		self.__y = 0
		self.__sum = 0
	def setX(self, t, tmpx):
		self.__x = tmpx
	def setY(self, tmpy):
		self.__y = tmpy
	def getSum(self):
		self.__sum = self.__x + self.__y
	x = property(None, setX)
	y = property(None, setY)
	sum = property(getSum)

It may a bit hard to see why Python properties are so nice with this example, but there are a few things that are still pretty clear. Firstly, the user of your class just needs to know the name of a variable to set or get. They don’t need to deal with your naming conventions for setters or getters. This in turn lets you steer development of your class with finer accuracy.

Unlike Java, you don’t need to lock down your fields by declaring them private and then writing getters and setters which really do nothing. You can start by having everything be wide open and public and save a lot of boilerplate. Then when the need arises to make something closed, you can implement that as a method and leave everything else unchanged. Through it all users are blissfully unaware of what’s going on. The class becomes a smart store-house of data which they can get just by reaching for it. To see a larger example of how setters are useful, see this really informative post.

Over the past years I’ve learned to be careful when comparing languages, but in this case, I can say definitely that Python wins. To recap, here’s why:

  1. No boilerplate necessary. Compact code is mostly a good thing. Programs can easily be huge and complex, adding tons of getters and setters just in case you might someday need them doesn’t help.
  2. Better encapsulation (IMO). The way I see it, encapsulation is all about communicating with the user on a need to know basis. Why should the user need to know implementation details like whether you have a getter and setter and if your class attribute is public or private? Give them a single name by which they can access and alter a specific piece of data. Your API  documentation should say whether something is available to read/write or not.
  3. Properties let you put off the private/public decision. The fewer decisions you make, the less chance of having to change them later.

One note to remember is that for your classes to use properties, they have to inherit from object. I hope this article has helped you understand one of Python’s really useful but not-as-often used features. It can be hard to get used to a snappy, fluid language like Python coming from bulky Java or C++, but using language features like properties is a great way to start.

Sunday Selection 2009-03-29

Reading

Intrinsic Motivation in Open Source Development The open source movement has become one of the most powerful forces in technology in the last few years. But why do it? What’s in it for the programmer who releases his mind-work into the open for no tangible compensation? This paper is a good read if you’re interested in the motivations behind open source. The second half is full of some fairly heavy economic terminology and ideas but the first half is readable by anyone. There are also lots of references pointing to related information.

Media

Linus Torvalds on Git Git is quickly becoming one of the favorite version control systems for the open source world. It’s my personal favorite too. This presentation at Google has Linus Torvalds talking about Git’s technical merits and what makes it so attractive. The talk is fairly technical, but I don’t think that should be a problem for my readers.

Software

Ubuntu 9.04 Beta If Git is becoming the most popular version control system, Ubuntu seems to be becoming the most popular consumer linux distribution. The newest Beta includes new versions of GNOME and X.org. Despite the beta label, it should be quite stable though you probably don’t want to trust it with your full workflow just yet.

Powerful Python

PythonDespite yesterday’s talk of hopelessness over MIT moving it’s intro course to Python, I have no qualms about saying that Python is my favorite general purpose language. It has a neat syntax, a clean module system and lots of functionality built into the language. It’s also pretty easy to learn even if you’ve never programmed before and if you have programmed before you’ll be surprised at how much easier Python lets you get things done than a lot of other languages. There are a lot of modules out there which extend Python’s usefulness (Numpy and PyGame are the ones that most quickly come to mind). There are also bindings to cross-platform toolkits like Qt and GTK+ as well as an interface to interact with Objective-C based toolchain used by OS X. Even more interesting is Jython: a port of Python that runs on the JVM and lets you access the functionality that the Java platform offers.

Of course Python does have its share of quirks. The object system and it’s associated scoping rules take some getting used. It also uses whitespace indentation as a way to determine block nesting. It’s not something that you can’t get used to and I don’t mind it at all, but I know it drives some people mad. But let’s face it, no language is perfect and Python is really pretty good for a lot of things.

I’ve had a good time learning Python. It’s documentation is pretty good and the dynamic typing gives you a lot of flexibility which can be very empowering, especially if you come from a static language like C++ or Java. I’ve written a few Python posts in the past, but there’s still a considerable amount that I’d like to share with readers. I’ve decided to start a new series where I write articles covering slightly advanced and what I hope will be helpful Python topics. I’m not sure yet what format they’ll take on, but they will include example code, howtos, and maybe some references to similar features in other languages. You might want to bookmark this page as I’ll keep updating it with links to the new articles whenever I put them up. Here are the articles that I’ve already written and ones that I’m planning. If there is anything that you want to write about, please let me know

Hope, scarred and bleeding

“The Heavens burned, the Stars cried out. And under the ashes of infinity, Hope, scarred and bleeding, Breathed it’s last.”

Ulatempa Poetess
Elegy for the Commonwealth
Andromeda

The International Lisp Conference is on right now and one of the interesting things that happened is that MIT’s Gerald Sussman talked about why MIT’s famous 6.001 course uses Python instead of Scheme. I hadn’t known that MIT had switched to Python from Scheme, and I must say that I’m very sad (as the dramatic opening quote makes obvious).

Truth be told, MIT has always been something akin to Holy Ground to me. It is deeply involved with computer science history and lots of great discoveries have come from there and from the people who worked there. 6.001 has to some extent become famous as MIT’s hallmark course. It’s the course that spawned the wonderful Structure and Implementation of Computer Programs, a book that I think all serious programmers should read at some point in their careers, preferably sooner rather than later. Since the 80’s 6.001 has used the Scheme programming language, a lightweight version of Lisp which is itself one of computer science’s shining accomplishments. Now it uses Python.

So what’s the big fuss about? Well there are a number of different reasons. 6.001 has always been meant to be a hard course, designed to fundamentally change the way students think. It’s not a simple programming course. If you read the introduction to SICP (and then the rest of it), you’ll see that 6.001 is more of a course in engineering philosophy and techniques which only incidentally uses computers and software. I’ve been reading SICP in bits and pieces for about a year and half and the things I’ve learned have made me a much better programmer, engineer and thinker. Is it possible to have the same sort of teaching and learning experience with Python instead of Scheme? Perhaps. I’m not sure. But that’s the tip of the iceberg.

Reading the reasons Sussman gave for replacing Scheme with Python is very distressing. Basically it comes down to the admission of the fact that the software industry is for the most part, one giant mess. Software isn’t well documented and the interfaces are inconsistent. As a result  you spend a lot of time tinkering with 3rd party libraries to figure how they work as opposed to how they are supposed to work. All this flies in the face of what 6.001 was supposed to teach: great engineering works are built out of simpler, smaller components that are well understood and encapsulated. I’ll be the first to admit that what Sussman says is completely true, I’ve experienced it myself more than once. This hasn’t been helped by the proliferation of CS courses that teach that students to be dependent on frameworks and third libraries without teaching them about how those frameworks work or what to do when they break. It’s part of the game and something you have to learn to deal with.

But it seems to me that Sussman’s statement is like throwing in the towel. If you can’t beat them, join them, that sort of thing. I’m sure there’s more to the argument than just that, but it would be hard to avoid the fact that 6.001 has been substantially toned down. There is no way students are going to learn what good software is unless they have some experiences of making it, themselves. There’s a reason why math students still work on problems even though Mathematica is a great tool.  Python is a great beginner’s language, but 6.001 wasn’t exactly made for beginners. Referring back to the preface to the first edition of SICP, the author says that many students have had experience of computers. Let’s take some time to remember that when this was written decades ago, ‘experience of computers’ didn’t mean playing games, using Office or browsing the web. It probably implied a general understanding of how computers work and some amount of programming. Times change and one could say that 6.001 is just adapting to the times. That could certainly be true and if it is, it’s a sign that the change isn’t good.

If students are starting programming at a later age, and hence aren’t accustomed to the very demanding logical thought required by 6.001, then it’s a sign that the software industry is in grave danger. Let’s face it: 4 years are not enough to learn how to be a good programmer or software engineer. 4 years working full time at it wouldn’t be enough. Now throw in the typical trappings of college life: other courses, sports, parties, relationships. The time most CS students spend actually practicing their skills go down dramatically. And it’s not like computer science is easy, we all know that. After graduation a job means that you have even little time and inclination to improve yourself. CS education needs to start early. The best programmers I know (both personally and otherwise) are the ones who started young, in their mid-teens mostly. The industry depends on mediocre programmers who are then most likely doomed to stay mediocre because they missed the time of their lives where they could have learned the most.

You see, the problem and feelings of hopelessness go beyond 6.001. Being a student myself, I see a vast disparity in the skills of my fellow CS majors. What’s more terrifying is how low the low point is.  A freshman I know is currently building a web framework for automatically converting Java object models to visual representations which can be manipulated. Think of it as interactive UML on steroids. It wouldn’t be an exaggeration to say he’s probably the only one in my school who can do that, me included (though our CS department is probably under 75 students). In one of my 300-level classes, a student claimed he didn’t know what a binary representation of an integer is. How can you be in a sophomore CS major without knowing what a binary integer is? And my CS department is one of the more rigorous ones I know of. We use a wide variety of languages and tools, have a separate curriculum for non-majors and the requirements for graduating with honors are pretty strict. The professors routinely give students experience out of class by involving them in research projects.

In some respects, you could say I don’t need to care. I get good grades and I’ve seen and know more than most of fellow students. But it’s not me I’m worried about. Not yet at least. I’d like to make a lasting contribution to computer science at some point in my career, but equally importantly I’d like to work with people who love computer technology as much as I do and are also skilled and dedicated enough to work on something that could be groundbreaking. I’m really afraid that there might be fewer people like that in my age group. Perhaps that’s just because I live in a small school, but if there aren’t going to be any more 18-year olds spinning Scheme abstractions with ease, then my fears seem terribly close to justified.

I’m going to stop now because I feel like I’m screaming at the top of my lungs in an empty room. And I have to call home. I’m willing to believe that the picture isn’t as bleak because I see people like my freshman friend. But it’s takes a good amount of will to keep it up. If you’re in college (or know someone in college) who would like to get together and maybe reverse this trend of approaching mediocrity, please get in touch with me. I could really use a hope infusion right now.

PS. On an equally terrifying note, there’s only one girl in my software engineering class of 27, but that’s not something I’m going try to explain any time soon.

Simplicity and clarity as design tools

For the past week or so I’ve been customizing my WordPress install by adding a number of plugins. I plan on writing an article soon about what plugins I’m using, but first I’d like to talk about something that I’ve noticed about WordPress plugins. Installing plugins is easy. There’s plugin directory in every WordPress installation, generally under wp-content and you just drop a plugin folder in this directory and WordPress will detect it. If you’re using a recent version of WordPress, you can actually search the plugin repository and install a plugin from inside WordPress itself. This is really useful feature and makes using plugins much easier. However, the one gripe I have is that the settings to control the plugin get spread all over the WordPress interface.

Here’s a screenshot of my WordPress Dashboard. Along the left side you can see buttons for various drop down menus:

My WordPress Dashboard

My WordPress Dashboard

I’m currently using 8 plugins, each of which have various settings and interfaces to change those settings. One of those plugins (StatPress) has it’s very own sidebar menu. Two of them are under the Plugin menu, one is under the Tools menu and the rest are under the Settings menu. That means that if I have to find the settings for a particular plugin, I potentially have to look in 3 different places to find them. For me this is a clear design and organization mistake. Things that do the same thing should be in the same place. I’ve picked WordPress as the example because it’s what’s been striking me most recently, but it’s not the only one. Office 2008 for the Mac allows users to change the default saving format to the old formats. This option is in the same place for Word and Powerpoint, but is different from Excel.

Simplicity favors regularity. User interfaces need to be both simple and regular. Google’s interface is probably the simplest you can get. WordPress’ interface in general is pretty regular and simple, but the organization of the plugins is not regular at all and so there’s an extra level of complexity added in which users really shouldn’t have to deal with. It’s not just user interfaces. Indenting your code properly is also recommended for the same reason. All code that gets executed as part of the same operation should have the same indent level. It’s a visual clue that’s much easier to follow if done right than keeping track of curly braces. One of the criticisms of Perl is how it uses contexts: variables and operations can mean different things depending on what’s surrounding them. That isn’t always a bad thing, but it does mean that you have to consciously keep track of the context your code is in.

Similarly the development of the RISC processor architecture was motivated in part by the fact that most processor instruction sets are highly irregular with lots of instructions that seem very similar but actually operate quite differently (or have some seemingly arbitrary limitations). The book Computer Organization and Design which was co-authored by one of the creators of the RISC architecture, states 4 design principles, one of which is that Simplicity Favors Regularity. I must say I have to agree

While designers should try to contain functionality within regular components, it’s also possible to go overboard. Typical electrical installations in buildings are controlled by rows of identical switches. Very regular, but also perfectly impossible to tell which switch does what. Things that do different things should look different. Clarity is key. In the Design of Everyday Things, the author talks about how employees in power plants distinguish switches on their consoles by attaching different handles to them. One of the reasons that Lisp looks scary to a lot of programmers is that it can easily look all the same, as opposed to C-like languages where a for-loop looks very different from a function call or object creation. To be more specific, the external design of an object (or interface) should suggest it’s possible function. Admittedly this is harder for on-screen software than it is for physical object.

But these two principles seem contradictory. How do we reconcile regularity with easy differentiation. Part of the answer is that there is an underlying concept which is pretty powerful: ambiguity is bad. On Reddit today I saw an article criticizing Xfce’s settings interface. The author couldn’t figure out how to increase his text size. There is some comparison with the Windows 3.1 interface, but the central reason is that the Xfce settings interface is ambiguous: should text size be under Window Manager Settings, Display or User Interface Settings, all of which seem equally probable to the casual user. It’s the exact same case for my WordPress plugins. I think plugin settings should be under Plugins, but it does make sense to put them under Settings or Tools.

xfce-settings

Xfce settings manager

So the question then is: how does one avoid ambiguity and achieve clarity, especially when there really are a number of logical alternatives (like WordPress)? In some cases it’s possible to use clear labels to distinguish what operation will be performed. Taking another example out of design of everyday things: a flat bar across a door makes it clear that the door is to be pushed, not pulled. When it comes to designing software interfaces like WordPress or XFCE, it’s not quite so simple because there can be a lot of debate over which design is correct. After all, plugin settings can go under Plugins or Settings. I feel that the key here is to just make a choice and then make that decision known. In the case of WordPress, plugin settings could go in either Plugins or Settings, though I would personally prefer Plugins so that there is one place for all plugin-related activities. But equally importantly, plugins should not be given a choice as to where in the UI to go. The case for XFCE is made more complicated by the fact that the text size for the window title bars can be changed independently of the rest of the UI and hence text size control is spread over two places. I would still argue that the two should be pulled into a single interface, but this is a case where the interface is heavily influenced by the functionality. The thing to remember in this case is that even if something might seem clear and logical for the developer or engineer, it might not seem that way for a casual user. Users don’t want to spend any time actively learning where things are, they just want to use it and they want it to work. Beta testing products with users is a great tool and it’s also essential that design decisions be well documented in easy to read manuals.

In conclusion: ambiguity is the enemy. Fighting ambiguity requires a two-pronged approached. Interfaces must be simple and regular so that the user is not overwhelmed with choices. But at the same time, there must be sufficient cues to let the user know that different objects have different functions and how required functionality can be accessed. This can be done by using labels (different text and images) and by offering easily accessible documentation. However, good design requires many decisions and compromises and the best way to figure out which ones are best is by watching actual users and adjusting accordingly.