Testing private methods (or not)

As my software engineering class progresses I’m gradually bringing back old habits of writing unit tests and carefully designing class methods to be unit-testable. Yesterday as I was doing a lab dealing with pointers (we’re learning C++), I ran into an issue which I think might be a serious problem in the weeks to come. We’re using the CppUnit testing framework and CppUnit won’t let me test private methods. I haven’t seriously looked around to see if there is a workaround because there’s a more important question here lying under the obvious “how”: should private methods be directly tested at all?

A quick Google search reveals lots of opinion both for and against unit testing private methods. The important arguments against are:

  1. Testing private methods breaks encapsulation and hinders the developer’s ability to refactor even if the external interface is preserved. This has effects on the efficiency of your software production cycle.
  2. If testing the external interface (via public methods) doesn’t adequately exercise private code, then why is the private code there?
  3. If you are convinced that the private methods need to be tested, that can be a sign that the private code needs to be refactored into a separate class.

Proponents of private testing will offer the following arguments:

  1. Anything that can break should be tested.
  2. If you’re going through an interface to test something beneath it, that’s not really unit testing, is it? (I actually just thought of that right now, not sure if it has come up before.)
  3. Testing should be fast and give precise feedback. Having to go through intermediate methods can slow down the tests and more importantly can make the source of failure ambiguous.

I’m sure there are other more nuanced arguments on both sides, but I think this gives a fair overview of the situation. As you can see, it’s not a simple case of practicality vs theoretical purity. Both sides have practical and theoretical arguments.

I am mostly a proponent of the practical side of things. I believe that rules and frameworks should be designed to enhance our productivity and at the same time, we need some rules in place to make sure that we are performing as best we can. With such a mindset, a decision isn’t easy. On the one hand, I would like all of my code to be tested at as fine grained a level as is reasonable and I should be able to find errors on an equal level of granularity. The minimum level of granularity I’m ready to accept is the method level. That means that every method should be tested individually and when a test fails it should unambiguously point me to the faulty method. If I need finer detail than that I can pull out a debugger.

But at the same time, I highly value my professional freedom. When I agree to write a piece of code to do something, I should be given maximum freedom within the constraints of working as part of a team. That’s not an excuse to write faulty or obfuscated code because other people will almost certainly be using and/or maintaining it. However, I should not have to go through a lengthy test-passing-approval-getting process everytime I refactor a method into two or more smaller ones. That of course is argument #1 against testing private methods.

Since I hold practicality in such high esteem, I think the need to testing anything that can break is of paramount importance. If your code can break, you need to know about it as soon as possible, encapsulation be damned. It’s not that I’m not outright rejecting against arguments #2 and #3 (I’ll get back to them later) I’m just saying that I hold ‘for’ argument #1 as paramount. With that in mind, unit testing private methods becomes necessary and I need to find a way around the objections.  To solve #1, perhaps what is needed is a slightly higher level look at the problem. Tests that exercise your code fall into one of two categories:

  1. Tests that you’ve written yourself for your own code.
  2. Tests that other people have written for your code.

I know that many large software houses have employees whose job it is to write to test code, but even if you know for certain that there will be other people testing your code, you should write your tests. By doing so you can take a swing at objection #1. Tests written by the team’s testers will verify the public interface and make sure (hopefully) that your code as a whole does what it’s supposed to do. But your own tests will make sure that the internals of your code are safe. Here you can test for yourself any edges cases that the public interface might let slip through (for various reasons). Since you control both the tests and the code, you can decide when some tests no longer make sense and remove them accordingly.

Of course, this isn’t a bulletproof solution to the problem. There are all the potential problems associated with writing your own tests: it takes a certain amount of discipline and dettachment to write stringent tests and it can be tempting to just write tests that you know your code will pass. It’s up to you as a developer to develop the proper mental attitude. Plus there is always the mental overhead of actually writing and maintaining your own tests in addition to your code. You have to decide for yourself if you’re up to the job.

Coming back to the original list of objections to private testing, #2 and #3 are still unanswered. In fact I don’t think they really have to be ‘answered’, because they point to more underlying issues. #2 raises the question of simplicity. If your code does more than it needs to, that opens up the door to bugs that could easily be avoided. If external callers will never actually call some part of your code, you really should consider leaving it out. #3 is a derivative of object oriented design: if you need to be calling code regularly from an external source, make it it’s own class rather than nesting it layers of accessor code. The logic of this argument is undeniable, but every once in a while (and probably more often) you need to sacrifice the purity of OO for something that is more convenient. That being said, I would still very much encourage you to be careful about where your code is and why it’s there. Good design can make your life as a programmer much easier.

So the final verdict (for me at least) stands thus: test your private methods. However you should not use this as an excuse to write ill-designed code in the hopes that testing will catch the bugs. Testing should be applied to catch any errors that have not been eliminated by careful design, not as an end-of-cycle precaution in the hopes that everything does what it’s supposed to. Use private method testing proactively to pinpoint and eliminate hard to see and unexpected bugs, not as a blanket measure against anything and everything that can go wrong. If your code passes public testing but fails private testing (or vice versa) it’s still bad code and everything that can break, will break. If you’re going to use tests at all, respect what they tell you.

Of course getting your framework to allow private testing without badly abusing object-orientation is another matter altogether. And I haven’t even started looking for an answer to that one.

IDEs, education and the UNIX philosophy

I’ve never really been a big fan of IDEs, though I can appreciate how they can help speed up the Edit-Compile-Test cycle. For almost two years now I’ve been trying to use a text editor + command line strategy. However, at the moment I’m in a position where I need to start using IDEs again. In particular my software engineering class uses KDevelop and over the summer I’ll be working on a research project developing an Eclipse plugin, so I need to get familiar with Eclipse before then.

I must say that I was initially put off by the idea of having to use Kdevelop for C++ development in my software engineering class. I was really hoping to be able to use Emacs, GCC and make full time, adding to my current level of Emacs and BASH knowledge. But I have been using KDevelop for about 2 weeks now (somewhat grudgingly I must admit), there are a number of things that KDevelop makes rather smooth. Creating an SVN repository for your project and committing to it is rather well integrated with the interface. At the same time, you can’t directly checkout out a working copy and start working on it as a project: you need to check it out manually to a directory, import the directory as a project and then edit the project options to have it be version controlled. Not exactly a very smooth workflow. KDevelop’s integration with the Qmake program to automatically create Makefiles is also a time-saver. Makefiles are files which contain instructions (used by a program called make) to build large projects containing lots of files. However it is very tedious to write these Makefiles by hand and KDevelop mostly takes that trouble away. Though I’m only just starting to explore KDevelop, I think I’ll enjoy using it.

However, on a larger scale, I’m still in two minds on the use of IDEs. I still firmly believe that IDEs should not be used as primarily teaching tools, certainly not in a course like Software Engineering, which is for people who are fairly sure that they will be computer science students. I learned about Subversion before using Kdevelop. I can understand and appreciate how KDevelop helps speed things up and more importantly when things go wrong or the interface doesn’t quite work the way I want it to, I can easily drop into a shell and fix things. However I know much less about Makefiles, since I’ve never really used them. I know enough about it to understand that there is a significant amount going on under the hood which is being kept hidden from me. If something went wrong I’d be helpless to fix it. It’s not a feeling I’m comfortable with. I really wish that we had been told about Makefiles and how they work and then been told about Qmake and how the integration with KDevelop speeds things up. At this point I really hope we learn about what goes on under the hood and do so soon.

One of the reasons that I am uncomfortable with IDEs is that I’m a strong believer in the UNIX philosophy. This school of thought as applied to programs can be stated thus:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

I have some doubts about the text streams part, but I firmly believe in the first two segments. Once again coming back to education, it is important that students learn the UNIX philosophy and learn it well. They may not necessarily be writing completely separate programs for every little thing, but the need to preserve modularity in software is generally acknowledged. By learning how programming actually depends on lots of different tools each with their own functions, but with good interfaces, students can better understand how to write modular software. The IDE encourages the idea that powerful software comes in monolithic chunks. Even if the idea is modular and depends on other programs under the hood, that fact is not always obvious to the student. It is much easier to understand the power of modularity when your tools are clearly modular in nature. Consider Emacs: it is extremely modular and extremely extensible by virtue of it’s embedded Lisp interpreter. When you want to give Emacs the powers of an IDE you string together a number of Elisp packages. More importantly you are actively encouraged to write your own Elisp code to bend Emacs to your own. This flexibility is on a completely different level even when compared to modular IDEs like Eclipse.

I suppose that many of my gripes with IDEs would disappear if they really were integrated. But they’re not. I’m using a bunch of tools for my Software Engineering course: Umbrello for UML, Doxygen for code documentation extraction and Mantis for bug tracking. The only integration here is between Doxygen and KDevelop. This means that designing, implementing and then documenting a feature will mean using several different tools. Umbrello can generate code templates from my UML diagrams, but it can’t update the diagrams as I change the code. That means that when I need to write utility classes or methods, I have to put them in code and then put them in the diagrams again manually. By using an external bug tracker I need to remember to check off on the buglist when I fix something. In a truly integrated environment, the bug tracker would be part of the IDE. As a developer I could then add information to the bug report, such a link to a specific part of the code where the bug exists. Anyone looking at my code later would be able to see that there was a bug associated with that part of the code which had been fixed (or was still active). Of course, this would mean that version control would have to be really tightly integrated so that all the changes ever made could be pulled up and compared at a moment’s notice. If I need to play the juggle-the-program to do my work, I would much rather use distinct programs and tie them together with Elisp or BASH script. In such a case a common text interface would be very useful. UNIX philosophy wins again (to some extent).

Of course, I am picking on the tools I’m using and that may not be a representative sample. I’m also aware that KDevelop isn’t exactly the gold standard in IDEs. At this point some open source enthusiasts will be pulling out the do-it-yourself card: if you don’t like the state of KDevelop, write some code to fix it. An understandable argument, and normally I’m all for it. Do it yourself is a lesson well worth learning if you’re in the technology field. But in this case, I feel it is somewhat besides the point. Like I said, I’d rather just use Emacs. That’s not to say however that I won’t ever be swayed. Right now, I would willingly use an IDE if it unified the tools that I’d be using and actually sped things up.

Ultimately it comes down to the simple fact that as someone who intends to write software as a livelihood (or as part of a livelihood at any rate), I insist on using the best tools for the job. Right now that tool is Emacs + GCC (and all the supporting tools like make). If an IDE does come along that offers me the same customizability and raw power that Emacs, I would not hesitate to give it a fair chance to prove itself. This is not to say that Emacs doesn’t have some problems, but it’s better than the competition for my use. On the same note, I also wish very much that my fellow students would learn to understand what really makes a tool powerful. However considering that one of my classmates commented that he doesn’t like the terminal because of how much he has to type, I don’t think that will happen anytime soon. Education in computer science deserves a good few posts all to itself, and it has it’s own very large set of problems, but if you actually are reading this post, I think you already have enough knowledge to know when your tools don’t quite cut it. Just make sure you act on that knowledge.

Moving from Subversion to Git

I just finished moving my files from Subversion to Git. Git is a distributed version control first built by Linus Torvalds and has matured a lot since its first creation. It is currently used by a number of important open source projects including the Linux kernel, Perl, Wine and Ruby on Rails. I chose to move to Git for a number of reasons:

  1. Distributed: So I can make commits even if I’m not online and have a complete history of changes.
  2. Easy branching and merging: I found myself keeping a ‘scratch’ folder in Subversion and only transfer changes to a working copy once I had finished all my changes. I feel that this defeats the purpose of having version control. Git supports easy branching so I can make experimental branches (which have their own histories) and then merge them back into the main branch when I’m ready.
  3. Normal commands like mv, cp, rm can be used as Git doesn’t track files individually.
  4. Interacts with SVN: I’ll need to use Subversion for my school projects, but I can have a personal Git repository where I make regular commits and only push to the team’s Subversion repo when nothing is broken.
  5. All the cool kids are using it.

I have two subversion repos: one for my code and one for other documents. The move from Subversion to Git was actually quite smooth. My repositories currently live on my old G4 Powermac, so I decided to do the transition on that machine itself (though I didn’t need to due to Git’s distributed nature). I had found a quick tutorial which I followed to do the actual move from a Subversion repo to a Git repo. I then did a quick git clone as follows on my laptop and desktop:

git clone ssh://domain.com/path/to/repo/

I could have used the simple Git server instead of SSH, but since I would be doing regular pulls and pushes (updates and upstream commits), I decided to just use SSH uniformly. Since then I’ve made changes to my Source repo and it has synced properly to my desktop. I ran into a problem where one of the git commands could not be executed on the remote machine. This turned out to be a problem with SSH on OS X, where the path for the non-interactive shell started by Git didn’t have the proper path. I didn’t research it very much because it turned out that the adding the following line to my ~/.bashrc solved the problem.

PATH = $PATH:/usr/local/git/bin

This adds the git path to the users path and lets the clone run properly.

The only disadvantage is that I have do a commit to my local repo and then push it to the repo on my server. However it’s still a simple process. Whenever I make a change I want to save I do a simple

git commit -a

and enter an appropriate message. Then before I leave my computer for a long time I do a

git push origin

which synchronizes all my local branches with those on the server. A simple git pull suffices to update the local repo.

I haven’t had a chance to use this system fully yet as I haven’t done much moving about. But classes start tomorrow so I hope to have a chance to use my new system properly. In particular I plan to make use of the easy branching and the SVN integration. My college’s lab machines don’t currently have Git installed on them, but I’m going to request to have Git installed. From what I’ve seen of the faculty, this shouldn’t be hard to accomplish. Here’s looking forward to the rest of the year using Git to boost my programming productivity.

Reigning in your passion

One of the central features of the hacker mindset is that we thirst for novelity. Whether it’s the latest new operating system, programming language, framework or platform, we hackers are instinctively drawn to explore the great unknowns, to boldly go where no one has gone before. Without this quest to find something new Linus Torvalds wouldn’t have gotten eager volunteers to hack on his hobby operating system, the dot-com boom of the 90s would never have happened and we might still be using computers with 2Mhz processors and 64KB of RAM.

However while this inquisitiveness drives industry and innovation, it can be somewhat damaging for the lone hacker, especially for someone who is still very much in training (like myself). There’s so much to do out there, so many shiny toys to play with, it’s hard to make up your mind about which ones to try out. It’s very tempting to just go about trying one thing after another. Unfortunately, if you’re looking to be an ace coder at any point in the not-so-distant future, you’re going to have to reign in your passion and curiosity, sit down and make some choices. Case in point: me.

Ever since I’ve been coding seriously (about three years now) I’ve been writing code for memory managed environments: first Java, now Python. Memory management does have it’s advantages: you can focus on the higher level algorithms and data structures and let the environment take care of the details. I have a limited idea of pointers and manual memory management and I’m sure that a lot of the code I’ve written would have been rather more tedious to write in an unmanaged setting. However, at this point I do feel somewhat spoilt by having all this work done for me. Also I did flirt with assembly language a while ago and I’ve always had a love for working close to the metal: actually understanding and controlling what the machine did as it processed my instructions. All these things being combined, I’ve been developing a thirst to do something low-level and actually taking the time to learn to write C and C++. Luckily for me I’m having a software engineering course in C++ and a computer organization course next semsester which will give me ample opportunity for low-level programming. But with 3 weeks still to go before my courses start, I was tempted to start learning some C on my own. And I would have if it weren’t for my other great current interest: Lisp.

Lisp is more than a wonderful language: it’s a whole new way of thinking. It’s functional style combined with macros make it powerful tool for a wide variety of problems. Moreover, the Scheme dialect is great tool for learning basic algorithms and computer science concepts. I’ve been in contact with Scheme for more than a year, but I’ve hadn’t the time to actually sit down and learn it properly. I’ve also been hoping to go through and actually finish Structure and Implementation of Computer Programs. The next few weeks with nothing much to do seemed like a great time to buckle down, learn some Lisp and rehash some fundamentals. Unfortunately, it won’t be easy doing that and learning C/C++ at the same time.

Therein lies the hacker’s dilemma: you have two amazing problems (learning C or learning Lisp) both of which promise endless hours of intellectual enjoyment. It would be great if I could do both, but I’m only human and if I do both, I won’t  be very good at either. For me it’s just the question of learning a new language for my own benefit, but in many cases it can be more important: choosing a framework for your webapp, a platform to develop for, different ways to organize your project and your team. Staying stuck in this dilemma for too long isn’t productive at all. My courseload for my first semester at college was pretty light, and I wanted to do some serious programming in all my free time. Unfortunately I faced the same choice: between Lisp and Assembly. I never came to a clean decision and as a result most of my first semester was wasted. In my last semester I had a choice between studying programming languages and parallel computing. Since I would actually be doing a course, I had to make a choice and had quite a fruitful semester learning about programming languages.

Now it’s time for a choice as well. I really don’t want to waste the next three weeks not doing anything substantial. I’ll be learning C/C++ next semester anyway and I already know enough that the first few weeks shouldn’t be too hard. Keeping that in mind I’ve decided that I’m going to put C/C++ on the shelf for the time being and focus on Lisp and finishing SICP. Once the semester starts, I’ll have to reevaluate and probably devote more time to C++ keeping Lisp for the weekends. There are some more choices I’ll have to make soon: focusing full time on my current research or exploring Scala and Hadoop, but for the time being, I’m not thinking about it. But one thing is certain: I’ll only become an expert programmer if I can balance my curiousity and passion with a healthy dose of realism and focus. I have passion and enthusiasm (I think) but the focus could use some work. One more thing on the list for this year.

Coming up in 2009

As another year comes to an end it’s time to get ready for the next year. I had a great year with lots of great new experiences, but I’m looking forward to having an even better year ahead. I have some great tech-related projects planned for the year ahead that I can’t wait to get started on. They’re in no particular order, since I consider them all equally important. Here goes:

1. Moving from Subversion to Git

Last year I started keeping my files under version control in a Subversion repository on an old Mac I used as a server. Though it was a good way to keep things in sync between my desktop, laptop and school machines, it hasn’t made my workflow as easy as I hoped it would. Due to a number of reasons, I’ll be moving my files to Git very soon. From what I’ve read on the internet, moving from Subversion to Git is a simple affair. This is the first major thing I’ll do after getting back to campus and a broadband internet connection.

2. Learning Lisp and C

I’ve been doing a lot of Python programming over the last year. I really like Python and I think it’s a great language that all developers should have a chance to use at some point. But I’m ready for a taste of something different. I’ll be using C and C++ a lot for my courses next semester and Lisp is a language I’ve been hoping for a while to explore in more detail. As Paul Graham says in ‘On Lisp’ C and Lisp are at two sides of the same coin: C models computers while Lisp models computation. I hope to have a worthwhile learning experience with these two different, but complementary languages. Along the way, I’d like to get more proficient in hacking Emacs.

3. More work on The Bytebaker

I made a decision to give more effort to blogging and writing in general when I moved this blog to its own domain. I’m going to keep up that effort in the new year. I’ll publishing more posts, hopefully of an increasing quality as well as putting up longer articles from time to time. In the latter half of the year I’m planning to move to independent hosting so that I can have more control and offer readers more features.

4. Research projects

My current research involving formal grammars is starting to gather steam and I can sense some really exciting work in the months to come. There are already a number of things that I’m interested in looking into and I would like to have publishable material by the time summer break rolls around. There is also a possibility I might have a chance to do work related to software engineering tools. Ever since I became interested in compilers, I’ve been thinking about the importance using proper tools and so this new work should prove to be enlightening.

5. Exploring Parallel Programming

Parallel programming is growing to become both one of the biggest challenges and opportunities of our time. My college has a small cluster running the Hadoop framework for parallel programming. I’m going to be doing some preliminary work to get used to the framework and the basic concepts and then I’ll be looking around for interesting projects to work on.

There are some other things that I’m interested in (Scala, low-level programming), but I’m going to try to focus on the ones I’ve mentioned above. At the same time, I’m open to change and if it seems that something else is worth pursuing, then I might have to reevaluate my priorities.