Sunday Selection 2013-10-20

Around the Web

Inside GitHub’s super-lean management strategy and how it drives innovation

It’s always interesting to see how groups of people organize to do useful work, especially in the age of startups and distributed workforces. This article takes a detailed look at GitHub’s structure and how their “open allocation” strategy affects their work-style and productivity. Interestingly, it also looks at how non-product activities (like internal videos and social meetups) can be planned and executed without a strict hierarchy.

Should we stop believing Malcolm Gladwell

As a graduate student I’ve become increasingly comfortable with reading scientific papers over the last two years. As a side effect of that, I’ve become increasingly skeptical of popular science books. They’re often lacking in proper references and I’m somewhat distrusting of the layer of indirection between me and the (hopefully) rigorous scientific work. This articles focuses on Malcolm Gladwell and his particular brand of scientific storytelling. It’s been a few years since I read any of books, so I can’t comment from personal experience, but if you’re interested in knowing how much science is actually in popular science, this article is worth your time.

Scott Adams on How to be successful

I recommend this piece with a bit of caution. It’s not your typical “how to be successful” piece. There isn’t much on the lines of “find your passion” or “all your dreams will come true”. In fact, this piece is very pragmatic, very down-to-earth and just a little bit mercenary. It’s for just those reasons that I think it’s worth reading — it’s a good antidote to the cult of “follow your dreams” that seems to have become popular. There are other gems in this piece such as “goals are for losers”. If you’re looking for unconventional and refreshingly honest career advice, read this now.

Books

I’ve been cutting down on video watching in favor of more reading. This week’s recommendation is:

Getting Things Done

GTD is a bit of an obsession in the tech community, spawning an endless number of variants, apps and how-to guides. I’ve been using one of those apps for a while (OmniFocus) and I’ve been familiar with the general GTD approach, but I just started reading the book last week. Surprisingly, the book has a pretty different feel from the GTD articles and guides you’ll find around the web. David Allen doesn’t just give you organizational strategies but also takes the time to explain why particular strategies are a good idea and how they may or may not work for you. I’ve often thought that the full-blown GTD system is a bit overkill, but reading this book makes me think that at a certain level of busy-ness, it’s actually worth it. After reading this book you’ll have no doubts that GTD is a carefully thought out, well-founded system and might be worth a try even if you’re not always super-busy.

 

Sunday Selection 2012-12-04

Around the Internet

How I went from writing 2000 words a day to 10,000 words a day Writing is no easy business and writing a lot on a regular basis is even harder still. It’s good to know that there you don’t need some special gift to become super-productive, you just need to carve out the time and work to the patterns that let you get the most out of the day.

Eleven equations Computer Science geeks should know There’s not much consensus when it comes to how much mathematics computer scientists and programmers need to know. Personally I would say that if you are a computer scientist you need a fairly strong mathematics background (something I’m still working on, I’ll admit). Even if you’re just a programmer I think having some mathematical familiarity will make you a better thinker and give you a better bag of tricks to call upon.

Clay Johnson’s Information Diet Though I love social networks, both the technology powering them and the interesting interactions they produce, too much of anything is a bad thing. I’ve been considering going on an information diet (or perhaps more correctly an information consumption diet) so that I could more of that time into creating instead of consuming.

Videos

How Github uses Github to build Github I firmly believe that good tools and workflows can make your job easier and your production better. I also think Zach Holman is really cool. While this focuses on Github it’s easily applicable to any group of developers (or creators in general) working together to produce awesome stuff.

Thinking about Documentation

My friend Tycho Garen recently wrote a post  about appreciating technical documentation. As he rightly points out technical documentation is very important and also very hard to get right. For someone who writes code I find myself in the uncomfortable position of having my documentation spread out in at least 2 places.

A large part of my documentation is right in my code in form of comments and docstrings. I call this programmer-facing documentation. It is documentation that will probably only be seen by other programmers (including myself). However, even though it might only be seen by programmers who are using (or changing) the code doesn’t mean that it should just be in the code. More often than not, it’s advisable to be able to have this documentation exported to some easier-to-read format (generally hyperlinked HTML or PDF). Of course I don’t want everyone who wants to use my software to go digging through the source code to figure out how things work. A user manual is generally a good idea for your software no matter how simple or complex it might be. At the very least there should be a webpage describing how to get up and running.

One of the major issues of documentation is that it’s either non-existent or hopelessly out of date. A large part of the solution is simply effort and discipline. Writing good comments and later writing a howto are habits that you can cultivate over time. That being said, I’d like to think that we can use technology to our benefit to make our job easier (and make writing and updating documentation easier).

Personally I would love to see programming languages grow better support for in-source comments. Documentation tools like Javadoc and Epydoc certainly help in generating documentation and give you a consistent, easy-to-understand format, but the language itself has no idea about what the comments say. They are essentially completely separate from the code even though they exist side by side in the same file. I would love it if languages could work together with the documentation, say by autogenerating parts of it, or doing analyses to detect inconsistencies.

As for documentation that lives outside of the code, I’m glad to see that there is a good deal of really good work being done in this area. Github recently updated their wiki system so that each wiki is essentially a git repo of human-readable text files that are automatically rendered to HTML. Github’s support for Git commit notes and their excellent (and recently revised) pull requests systems provide really good systems for maintaining a conversation around your code. The folks over at Github understand that code doesn’t exist by itself and often requires a support structure of both documentation and discussion surrounding it to produce a good product.

So what’s my personal take on the issues? As I’ve said before I’m starting work on my own programming language and I intend to make documentation an equal partner to the code. I plan on making use of Github Pages to host the documentation in readable from right next to my source code. At the same time, I’m going to giving some thought into making documentation a first class construct in the language. That means that the documentation you write is actually part of the code instead of being an inert block of text that needs to be processed externally. The Scribble documentation system built on top of Scheme has some really interesting ideas that I would love to look into and perhaps adapt. Documentation has always been recognized as an important companion to coding. I’m hoping that we’re getting to the stage where we actually pay attention to that nugget of common wisdom.

Give me back my name

Yesterday I decided to get a GitHub account. I don’t have anything to really put up on GitHub as of now, but I decided to get an account just the same. I wanted the username ‘basu’ because that’s usually the name people use to call me (and it’s the username I use on my own machines). However, that name was taken and so I settled for something just a little different: ‘basus’. Just out of curiosity, I decided to look up the profile of user ‘basu’. It turned out to be someone who had signed up in April, but didn’t seem to have actually done anything on Github. I felt a little surge of anger that someone had taken my name and then done absolutely nothing with it. I guess I really do like my name after all.

The thing is, it’s getting increasingly difficult to find a good name on the internet. And I’m not just talking about usernames. It took quite a bit of searching before I found a domain name that was free and that I wanted to use. When it comes to usernames, I try to get names that have ‘basu’ in them, generally with a letter or number at the end. That’s true for all my email addresses (I use about 3 full time for various things) and for my Twitter account. If it’s a new enough service, I can generally get just what I wanted. (If I was at GitHub a few months earlier, I would have gotten my name too)

It might seem a bit vain that I’m making a fuss about getting the name I want, but I have a reason. For this purpose, let’s divide the web services that I use in to two groups. First are the public ones that I want to give out to people. These are things like my email, my website and various social network/professional things. I want a username that’s easy for people to remember and is as close to my real name as I can get it. I don’t like having pseudonyms or handles because I think they’re just a drag (though I have nothing against people who do have them). It’s easy for people to get in touch with me if all they have to remember is my name and initials. Things get harder if they have to remember some non-obvious combination of my name and other letters or numbers (even if they make perfect sense to me). My personal email has my birthday in it, which is ok for family and close friends, but I don’t give it out to anyone else.

The second type of service is private. These are services I use where the name is used exclusively for login and verification purposes. I probably don’t want other people to know my username, but my passwords are generally pretty strong. What’s more important for me is that I would rather not have a bunch of different usernames and passwords for all the services that I use. Nothing irritates me more than when I just need to check something quick and I end up needing three attempts just to get the correct combination password. It’s a bit easier for services which require that I have an email address as login, I have a separate email account set aside for just that purpose.

While the bad news is that it can be too late to get the name you want, the good news is that there are other people who feel the same way and are doing something about it. The OpenID project is aimed at addressing the issue of having a long list of various login combinations and the security and usability problems associated with that. The idea of OpenID is simple: you create a single account with a traditional username and password with a service that is an OpenID provider. There are some dedicated OpenID providers like ClaimID or myOpenID, but a bunch of services you might already use like AOL, MySpace and WordPress.com are also providers. Once you’ve signed up, you get a URL which is your actual OpenID. Now when logging into a service that supports OpenID you just use your URL and tell your provider that you want that particular service to use your OpenID. This allows you to login to a bunch of different services with a single authentication system. It’s easier on your brain cells and safer too. There are a bunch of services that use OpenID with more joining all the time.

So I’m still a bit disgruntled about not being able to get my name on GitHub, but I’ll get over it eventually. In the long run, I suppose it won’t matter all that much. In some ways, I wish that people got some sort of unique identifier that worked uniformly accross all services everywhere. I think OpenID is a step in the right direction, though its not going to be a solution for everything. This is one of the cases where the problem requires a solution that is social as well as technical: how does everyone get the name/identity they want without stepping on other peoples’ toes. Like most other problems of that sort, there is no cut and dried answer, just a number of possible options and we have to choose the one we’re most comfortable with. For now the solution I’v chosen is a combination of OpenID and separate usernames, but like so many other things, I’ll keep my eyes open for something better.

The Documentation Problem

Over the past year and a half I’ve come to realize that writing documentation for your programs is important. Not only is documentation helpful for your users, it forces you to think about and explain the workings of your code. Unfortunately, I’ve found the tools used to create documentation (especially user-oriented documentation) to be somewhat lacking. While we have powerful programmable IDEs and equally powerful version control and distribution systems, the corresponding tools for writing documentation aren’t quite at the same level.

Let me start by acknowledging that there are different types of documentation for different purposes. In-code documentation in the form of comments are geared toward people who will be using your code directly, either editing it or using the API that it exposes. In this area there are actually good tools available. Systems like Doxygen, Epydoc or Javadoc can take formatted comments in code and turn them into API references in the form of HTML or other formats. Having the API info right in the code, it’s easier to make sure that changes in one are reflected in the other.

User-oriented documentation has slightly different needs. As a programmer you want a system that is easy to learn and is fast to use. You also want to be able to publish it different formats. At the very least you want to be able to create HTML pages from a template. But you also want the actual source to be human-readable (that’s actually a side-effect of being easy to create) because that’s probably what you, as the creator, will be reading and editing the most.

Then there are documents that are created as part of the design and coding process. This is generally wiki territory. A lot of this is stuff that will be rewritten over and over as time progresses. At the same time, it’s possible that much of this will eventually find its way into the user docs. In this case, ease of use is paramount. You want to get your thoughts and ideas down as quickly as possible so that you can move on to the next thought. Version controlling is also good to have so that you can see the evolution of the project over time. You might also want some sort of export feature so that you can get a dump of the wiki when necessary.

Personally, I would like to see the user doc and development wikis running as two parts of the same documentation system. Unfortunately, I haven’t found tools that are quite suitable. I would like all the documentation to be part of the same repository where all my code is stored. However, this same documentation needs to be easily exported to decent looking web pages and PDFs and placed online with minimal effort on my part. The editing tools also need to be simple and quick with a minimal learning curve.

There are several free online wiki providers out there such as PBworks and WikiDot which allow the easy creation of good looking wikis. But I’m hesitant to use any of them since there isn’t an easy way to easily tie them into Git. Another solution is to make use of Github’s Pages features. Github lets you host your git repositories online so that others can get them easily and start hacking on them. The Pages features allows you to create simple text files with either the Textile or Markdown formatting systems and have them automatically turned into good looking HTML pages. This is a good idea on the whole and the system seems fairly straightforward to use, with some initial setup. The engine behind Pages, called Jekyll is also free to download and use on your own website (and doesn’t require a Git repository).

In addition to these ‘enterprise-quality’ solutions, there are also a number of smaller, more home-grown solutions (though it could be argued that Jekyll is very homegrown). There’s git-wiki, a simple wiki system written in Ruby using Git as the backend. Ikiwiki is a Git or Mercurial based wiki compiler, in that it takes in pages written in a wiki syntax and creates HTML pages. These are viable solutions if you like to have complete control of how your documentation is generated and stored.

Though each of these are great in and of themselves, I still can’t help feeling that there is something missing. In particular, there is lack of a common consensus of how documentation should be created and presented. Some projects have static websites, others have wikis, a few have downloadable PDFs. Equally importantly there isn’t even a moderately common system for creating this documentation. There are all the ways I’ve noted above, which seem to be the most popular. There are also more formal standards like DocBook. Finally lets not forget man and info pages. You can also create your own documentation purely by hand using HTML or LaTex. Contrast this to the way software distribution works (at least in open source): there are binary packages and source tarballs and in many cases some sort of bleeding-edge repository access. There are some exceptions and variations in detail, but by and large things are similar across the board.

Personally, I still can’t make up my mind as to how to manage my own documentation. I like the professional output that LaTex provides and DocBook seems like a well-thought-out standard, but I’d rather not deal with the formatting requirements, especially in documents that can change easily. I really like wikis for ease of use and anywhere editability, but I must be able to save things to my personal repository and I don’t want to host my own wiki server. I’ve previously just written documentation in plain text files and though this is good for just getting the information down, it’s not really something that can be shown to the outside world. For the time being, I’ll be sticking to my plain text files, but I’m seriously considering using Github Pages. For me this offers the dual benefit of easy creation in the form of text files as well having decent online output for whatever I decide to share. I lose the ability to edit from anywhere via the internet, but that’s a price I’m willing to pay. I can still use Google Docs as a emergency temporary staging area. I’m interested in learning how other developers organize their documentation and would gladly hear any advice. There’s a strong chance that my system will change in some way in the future, but that’s true of any system I might adopt.