Show Git information in your prompt

I’ve been a sworn fan of version control for a good few years now. After a brief flirtation with Subversion I am currently in a long term and very committed relationship with the Git version control system. I use Git to store all my code and writing and to keep everything in sync between my machines. Almost everything I do goes into a repository.

When I’m working I spend most of my time in three applications: a text editor (generally Emacs), a terminal (either iTerm2 or Gnome Terminal) and a browser (Firefox or Safari). When in Emacs I use the excellent Magit mode to keep track of the status of my current project repository. However my interaction with git is generally split between Emacs and the terminal. There’s no real pattern, just what’s easiest and open at the moment. Unfortunately when I’m in the terminal there’s no visible cue as to what the status of the repo is. I have to be careful to run git status regularly to see what’s going. I need to manually make sure that I’ve committed everything and pushed to the remote server. Though this isn’t usually a problem, every now and then I’ll forget to commit and push something on one of my machines, go to another and then realized I’ve left behind all my work. It’s annoying and kills productivity.

Over the last few days I decided to sit down and give my terminal a regular indicator of the state of the current repository. So without further ado, here’s how I altered my Bash prompt to show relevant Git information.

Extracting Git information

There are generally three things I’m concerned about when it comes the Git repo I’m currently working on:

  1. What is the current branch I’m on?
  2. Are there any changes that haven’t been committed?
  3. Are there local commits that haven’t been pushed upstream?

Git provides a number of tools that gives you a lot of very detailed information about the state of the repo. Those tools are just a few commands away and I don’t want to be seeing everything there is to be seen at every step. I just want the minimum information to answer the above question.

Since the bash prompt is always visible (and updated after each command) I can put a small amount of text in the prompt to give me the information I want. In particular my prompt should show:

  1. The name of the current branch
  2. A “dirty” indicator if there are files that have been changed but not committed
  3. The number of local commits that haven’t been pushed

What is the current branch?

The symbolic-ref command shows the branch that the given reference points to. Since HEAD is the symbolic reference for the current state of the working tree, we can use git symbolic-ref HEAD to get the full branch. If we were on the master branch we would get back something like refs/heads/master. We use a little Awk magic to get rid of everything but the part after the last /. Wrapping this into a litte function we get:

function git-branch-name
    echo $(git symbolic-ref HEAD 2>/dev/null | awk -F/ {'print $NF'})

Has everything been committed?

Next we want to know if the branch is dirty, i.e. if there are uncommitted changes. The git status command gives us a detailed listing of the state of the repo. For our purposes is the very last line of the output. If there are no outstanding changes it says “nothing to commit (working directory clean)”. We can isolate the last line using the Unix tail utility and if it doesn’t match the above message we print a small asterisk (*). This is just enough to tell us that there is something we need to know about the repo and should run the full git status command.

Again, wrapping this all up into a little function we have:

function git-dirty {
    st=$(git status 2>/dev/null | tail -n 1)
    if [[ $st != "nothing to commit (working directory clean)" ]]
        echo "*"

Have all commits been pushed?

Finally we want to know if all commits to the respective remote branch. We can use the git branch -v command to get a verbose listing of all the local branches. Since we already know the name of the branch we’re on, we use grep to isolate the line that tells us about our branch of interest. If we have local commits that haven’t been pushed the status line will say something like “[ahead X]”, where X is the number of commits not pushed. We want to get that number.

Since what we’re looking for is a very well-defined pattern I decided to use BASH’s built-in regular expressions. I provide a pattern that matches =”[ahead X]” where X is a number. The matching number is stored in the BASH_REMATCH array. I can then print the number or nothing if no such match is present in the status line. The function we get is this:

function git-unpushed {
    brinfo=$(git branch -v | grep git-branch-name)
    if [[ $brinfo =~ ("[ahead "([[:digit:]]*)]) ]]
        echo "(${BASH_REMATCH[2]})"

The =~ is the BASH regex match operator and the pattern used follows it.

Assembling the prompt

All that’s left is to tie together the functions and have them show up in the BASH prompt. I used a little function to check if the current directory is actually part of a repo. If the =git status= command only returns an error and nothing else then I’m not in a git repo and the functions I made would only give nonsense results. This functions checks the =git status= and then calls the other functions or does nothing.

function gitify {
    status=$(git status 2>/dev/null | tail -n 1)
    if [[ $status == "" ]]
        echo ""
        echo $(git-branch-name)$(git-dirty)$(git-unpushed)

Finally we could put together prompt. BASH allows for some common system information to be displayed in the prompt. I like to see the current hostname (to know which machine I’m on if I’m working over SSH) and the path to the directory I’m in. That’s what the \h and the \w are for. The Git information comes after that (if there is any) followed by a >. I also like to make use of BASH’s color support.

function make-prompt
    local RED="\[033[0;31m\]"
    local GREEN="\[033[0;32m\]"
    local LIGHT_GRAY="\[033[0;37m\]"
    local CYAN="\[033[0;36m\]"

${GREEN} \w\
${RED} \$(gitify)\
${GREEN} >\



I like this prompt because it gives me just enough information at a glance. I know where I am, if any changes have been made and how much I’ve diverged from the remote copy of my work. When I’m not in a Git repo the git information is gone. It’s clean simple and informative.

I’ve borrowed heavily from both Jon Maddox and Zach Holman for some of the functionality. I didn’t come across anyone showing the commit count, but I wouldn’t be surprised if lots of other people have it too. There are probably other ways to get the same effect, this is just what I’ve found and settled on. The whole setup is available as a gist so feel free to use or fork it.

Thinking about Documentation

My friend Tycho Garen recently wrote a post  about appreciating technical documentation. As he rightly points out technical documentation is very important and also very hard to get right. For someone who writes code I find myself in the uncomfortable position of having my documentation spread out in at least 2 places.

A large part of my documentation is right in my code in form of comments and docstrings. I call this programmer-facing documentation. It is documentation that will probably only be seen by other programmers (including myself). However, even though it might only be seen by programmers who are using (or changing) the code doesn’t mean that it should just be in the code. More often than not, it’s advisable to be able to have this documentation exported to some easier-to-read format (generally hyperlinked HTML or PDF). Of course I don’t want everyone who wants to use my software to go digging through the source code to figure out how things work. A user manual is generally a good idea for your software no matter how simple or complex it might be. At the very least there should be a webpage describing how to get up and running.

One of the major issues of documentation is that it’s either non-existent or hopelessly out of date. A large part of the solution is simply effort and discipline. Writing good comments and later writing a howto are habits that you can cultivate over time. That being said, I’d like to think that we can use technology to our benefit to make our job easier (and make writing and updating documentation easier).

Personally I would love to see programming languages grow better support for in-source comments. Documentation tools like Javadoc and Epydoc certainly help in generating documentation and give you a consistent, easy-to-understand format, but the language itself has no idea about what the comments say. They are essentially completely separate from the code even though they exist side by side in the same file. I would love it if languages could work together with the documentation, say by autogenerating parts of it, or doing analyses to detect inconsistencies.

As for documentation that lives outside of the code, I’m glad to see that there is a good deal of really good work being done in this area. Github recently updated their wiki system so that each wiki is essentially a git repo of human-readable text files that are automatically rendered to HTML. Github’s support for Git commit notes and their excellent (and recently revised) pull requests systems provide really good systems for maintaining a conversation around your code. The folks over at Github understand that code doesn’t exist by itself and often requires a support structure of both documentation and discussion surrounding it to produce a good product.

So what’s my personal take on the issues? As I’ve said before I’m starting work on my own programming language and I intend to make documentation an equal partner to the code. I plan on making use of Github Pages to host the documentation in readable from right next to my source code. At the same time, I’m going to giving some thought into making documentation a first class construct in the language. That means that the documentation you write is actually part of the code instead of being an inert block of text that needs to be processed externally. The Scribble documentation system built on top of Scheme has some really interesting ideas that I would love to look into and perhaps adapt. Documentation has always been recognized as an important companion to coding. I’m hoping that we’re getting to the stage where we actually pay attention to that nugget of common wisdom.

Release schedules and version numbers

I just finished a major rewrite and overhaul of my long-term research project and pushed it out to other students who are working with me on it. In the process of the redo I rewrote large parts of it to be simpler code and added a few important features. I also cleaned up the code organization (everything is neatly divided into directories instead being spread throughout the toplevel), added comments and rewrote the documentation to actually described what the program did and how to use it. But it wasn’t just a pure rewrite and refactoring. I added at least one important new feature, added a completely new user interaction mode and changed the code architecture to explicitly support multiple interfaces. But the thing is that even though I’ve “shipped” it, it’s still not quite done.

There are significant parts missing. The unit testing is very, very scant. There is almost no error handling. The previous version had a GUI which I need to port to the new API/architecture. I also want to write one more interaction mode as a proof of concept that it can support multiple, different modes. The documentation needs to be converted to HTML mode and there are some utility functions that would be helpful to have. In short, there’s a lot that needs to be done. So my question is, what version of my code is this?

I started a rewrite of this last  summer as well but never finished — a casualty classic second system effect. For a while I considered calling this version 3.0 counting the unfinished copy as 2.0. But I decided it was rather silly and so I’ve actually called it 2.0. Though it’s certainly a major major change from the last version, in some ways it’s still broken and unfinished. Is it a beta? Or a release candidate? I suppose that’s a better description. Except the additions that I want to make are more than moving it from a beta to a full release. The GUI would definitely be a point release.

In many ways the debate is purely academic and kinda pointless. As I’ve written before, software is always beta. However, releasing major and minor “versions” of software is a popular activity. In some ways it’s helpful to the user. You can tell when something’s changed significantly and when you need to upgrade. In an age where you had to physically sell software, that was a good thing to know. However, the rise of web-based software has changed that to a large extent. If you’ve been using Gmail for a while, you’ll know that it has a history of small, regular atomic improvements over time. And it’s not just Gmail, it’s most of Google’s online services. Sometimes there are major facelifts (like Google Reader a few years ago) but by and large this gradual improvement works well. Google Chrome also uses this model. Chrome is officially past version 5 now. But thanks to its built in auto update mechanism you don’t need to care (and I suspect most people don’t). Rolling releases are clearly acceptable and may just be the way software updates are going to go in the future. Of course, if you’re charging for your code you’re going to have some sort of paywall, so no, manual software updates probably won’t go away forever.

Coming back to my original question, what version did I just release? 2.0? 2.0 beta 1? 1.9.5? Honestly I don’t really care. Part of my disinterest stems from the fact that Git makes branching and merging so easy. It’s hard to care about version numbers and releases when your code is in the hands of a system that makes it so easy to spin off feature branches and then merge them back in when they’re ready. If I worked in a fully Git based team I’d just have everyone running daily merges so that everyone just automatically got the new features. In that case I wouldn’t have waited to release. The big new feature would have been pushed a week ago, the reorganization and cleanup after that and then the documentation yesterday. I’d also be sending out the later updates and addition one at a time once they were done. Everyone else uses SVN, there might still be a way to do it.

In conclusion: rolling releases are awesome. Users don’t have to worry about manually updating and automagically get new features when they’re available. Developers using a good version control system can be up-to-date with everyone else’s code. This is especially important if you’re writing developer tools (which I am): the faster and easier you can get your updates to the people making the end product the faster the end product gets developed.

PS. If you’re wondering what exactly it is I’m making, more on that later. There’s a possibility of a source release after I talk to my professor about it.

My confession: I’m a data hoarder

There’s this new show on A&E TV called Hoarders. From the show’s website, each episode “is a fascinating look inside the lives of two different people whose inability to part with their belongings is so out of control that they are on the verge of a personal crisis”. It’s an interesting show about people, who quite simply, have too much stuff. I’ve watched a few episodes, it’s somewhat repetitive, and strangely addictive in the way that only these things can be. Though I never gave the show much thought after I finished watching an episode, a few days ago I had a strange epiphany: I might be a data hoarder.

Here’s the gist of it: I’m afraid of losing data. It’s not that I have a ton of important stuff which I use regularly, in fact much of what I have on hard drive (besides my music and pictures) are things I will probably never actively use again. What I’m actually afraid of is that someday I’m going to want some file (or some specific version of some file) and I won’t be able to find it. Now even if I do have the file, I might not find it due to poor organization and data retrieval systems, but that’s a matter for another blog post. What I’m afraid of is pure, simple data loss: I start working on a project, which I only have one copy of, and something happens to that one copy, whether it be a hard drive crash, or just human error and accidental deletion. And then I have to start all over again, with no real idea of what I did the first time.

Now, thanks to technology I’ve been able to deal with my hoarding instincts, without having dozens of different versions littering my hard drive and doing manual backups every week. At the heart of my system is Git, which lets me keep everything that’s important to me in strict version control. It also lets me easily keep files in sync between different machines, which is a problem I still haven’t completely solved (especially for public machines). By keeping things in sync between three different machines, I have backups in three completely different (as in physically separate) places.

The second thing that keeps my data in control is Amazon’s S3, with JungleDisk. Once a week, this ships all my Git repositories, music, pictures and various software installers to Amazon’s massively distributed storage servers for less than $5 a month. The choice was either this, paying as I go, or buying a terabyte hard disk. Personally I think made the right choice, since my backups are not only safe and secure in a far away place, and I didn’t have to shell out a lumpsum in one go.

Now all this was fine, but lately I’ve been having this urge to record everything. And I mean everything. There are all my tweets and dents which go out into the ether of cyberspace, which I might someday want to have on record. There are all the websites I visit and most recently all the music I listen to and the movies I see. In a perfect world, I would have all my tweets saved to a Git repository and all the DVDs I watch and music I listen to would be instantly ripped and placed in cold storage in an Amazon bucket (or a terabyte disk). And this may not be a good thing, for the reason that I wouldn’t see most of the DVDs for the second time and I have no idea why I would want to save my tweets (or ever look them up).

In the past week I’ve been sorely tempted to actually buy a terabyte hard drive and start manually ripping all the DVDs that I watch. I even went so far as to install Handbrake on my Mac Mini. I’ve been trying very hard to override the temptation with my logic (and laziness). It’s been hard but I’ve been successful so far. Underneath this is perhaps a more important issue: how much data is enough and how safe is safe enough? Keeping my own created data completely backed up in multiple places I think is perfectly acceptable, but I think that ripping all the DVDs is borderline obsessive. It would be an interesting thing too, and might be worth something in terms of street geek cred, but it’s not something that I can seriously see myself doing (and it’s possibly illegal too).

So there you have it, I’m a data hoarder, or at least I have data hoarding tendencies. No, I don’t need an intervention yet, and I don’t need treatment. In fact, I think I’m at the point where I’m reliably saving and backing up everything that I create (that’s more than 140 characters) but not randomly saving everything that I come into contact with. Maybe in another place and time I will actually be saving all my movies as well, but that will probably in terms of actually buying DVDs and having a properly organized collection, instead of borrowing them from the library. For the time being, I trust my digital jewels to Git, three computers and Amazon S3.

How to blog like a hacker

NB This was supposed to have been published last week, but WordPress ate the original draft I wrote and it took me a while to get the motivation to write it again. But it’s done now. Enjoy

Self-published content on the web has exploded and many of the millions of blogs out there are run on content management systems like WordPress, Blogger, MoveableType and LiveJournal. These are all robust software products that have survived years of rigorous field testing and offer a wide variety of features and customization options. They are good tools which offer a wide variety of customization options and other features. However, to those of us with a more do-it-yourself bend, it can sometimes seem that these systems do a bit too much.

Most of these systems provide dynamically generated content (MoveableType has a static option). The actual content (like posts and comments) is stored in a database and the actual HTML pages are generated on the fly. This makes things like templates and themes simple to use (as you don’t need to regenerate thousands of pages for each change), but it also means that you need to use considerable CPU for every page load. This can be problematic if you have a busy site or traffic spikes. CMS’s also offer a wide variety of features besides just running a blog or website. They allow RSS, comments and a variety of media features. While some people certainly use all these features, it can be overkill for others. Sometimes all you really want is to be able to write some text with an image or two and put it online in a quick simple way.

There are other, perhaps more esoteric reasons for wanting to run your blog without a modern CMS. You might want to have complete local access to your data and web interfaces for writing blogs might not be to your liking. Some people swear by flat files and their favorite text editors and they both have their own advantages. Flat files are simpler to browse about that depending on an interface to a database and certainly easier to move around. Advanced editors like Vim or Emacs can make the job of writing more efficient, especially if you’re already used to them. Popular version control systems like Git can make sure that your data is kept backed up and safe and can be used to keep things synced between a local machine and a server.

A combination of all the above factors has led to the creation of a number of ‘static site generators’: tools that take simple local files (generally plain text with some simple markup) and turn them into full HTML pages after applying some sort of template. You can keep the original source anywhere you want and just transfer the generated, static HTML to a server for the world to see. The original “Blogging like a hacker” post was written as the announcement of a popular static site generator called Jekyll. Jekyll is written in Ruby and is used to power the GitHub Pages online publishing system (it was written by one of the GitHub team). There is a Python clone of it called Hyde (fittingly enough) and a Perl program called Ikiwiki does a similar but integrates a version control system and is meant to be deployed on a server.

I’m not going to pretend that static site generators are the best fit for all or even most bloggers and web writers. In fact, I would suggest that static site generators are you’ most useful for those people who value simplicity and control more than anything else. Using a static site generator well requires at least working knowledge of HTML and CSS and a good text editor. Using a version control isn’t a necessity, but it seems to be that being able to use a VCS is one of the big attractions of static site generators: you don’t have to rely on whatever draft system your CMS provide, but rather you can use industry standard version control tools to keep your work safe.

I’m a fan of practicing what I preach, so I’ve been using a static site generator as well. I’ve personally be using Jekyll to manage a personal site for the last week or so. I think it’s the simplest such tool to set up and has sufficient features to create some pretty good looking websites. That being said, I don’ t think I’ll be moving this blog off WordPress any time soon. WordPress suits my needs well enough and converting almost 250 posts from WordPress to plain text is not something I want to spend my time on at the moment. I export the blog weekly and keep the resulting XML file under version control to be safe. I think of it as a sort of middle ground between having everything hosted in a remote server database and having a locally secure version of everything.

As I continue writing and using software, I’m coming to realize that the best tools are the ones that can be molded to your needs. Static site generators attract users who aren’t afraid (and might even be eager) to get under the hood. It’s the same sort of mentality that attracts people to open source software and systems like Linux: the ability to tinker to your heart’s content until you have something that is just right for you. If you are someone who gets can spend hours setting up your computing environment to be just the way you like it, then you’ll probably like working with static site generators. On the other hand if you couldn’t care less about what goes on when you hit the ‘Publish’ button, thats fine too. Use WordPress or something similar and just focus on writing great content.

The Documentation Problem

Over the past year and a half I’ve come to realize that writing documentation for your programs is important. Not only is documentation helpful for your users, it forces you to think about and explain the workings of your code. Unfortunately, I’ve found the tools used to create documentation (especially user-oriented documentation) to be somewhat lacking. While we have powerful programmable IDEs and equally powerful version control and distribution systems, the corresponding tools for writing documentation aren’t quite at the same level.

Let me start by acknowledging that there are different types of documentation for different purposes. In-code documentation in the form of comments are geared toward people who will be using your code directly, either editing it or using the API that it exposes. In this area there are actually good tools available. Systems like Doxygen, Epydoc or Javadoc can take formatted comments in code and turn them into API references in the form of HTML or other formats. Having the API info right in the code, it’s easier to make sure that changes in one are reflected in the other.

User-oriented documentation has slightly different needs. As a programmer you want a system that is easy to learn and is fast to use. You also want to be able to publish it different formats. At the very least you want to be able to create HTML pages from a template. But you also want the actual source to be human-readable (that’s actually a side-effect of being easy to create) because that’s probably what you, as the creator, will be reading and editing the most.

Then there are documents that are created as part of the design and coding process. This is generally wiki territory. A lot of this is stuff that will be rewritten over and over as time progresses. At the same time, it’s possible that much of this will eventually find its way into the user docs. In this case, ease of use is paramount. You want to get your thoughts and ideas down as quickly as possible so that you can move on to the next thought. Version controlling is also good to have so that you can see the evolution of the project over time. You might also want some sort of export feature so that you can get a dump of the wiki when necessary.

Personally, I would like to see the user doc and development wikis running as two parts of the same documentation system. Unfortunately, I haven’t found tools that are quite suitable. I would like all the documentation to be part of the same repository where all my code is stored. However, this same documentation needs to be easily exported to decent looking web pages and PDFs and placed online with minimal effort on my part. The editing tools also need to be simple and quick with a minimal learning curve.

There are several free online wiki providers out there such as PBworks and WikiDot which allow the easy creation of good looking wikis. But I’m hesitant to use any of them since there isn’t an easy way to easily tie them into Git. Another solution is to make use of Github’s Pages features. Github lets you host your git repositories online so that others can get them easily and start hacking on them. The Pages features allows you to create simple text files with either the Textile or Markdown formatting systems and have them automatically turned into good looking HTML pages. This is a good idea on the whole and the system seems fairly straightforward to use, with some initial setup. The engine behind Pages, called Jekyll is also free to download and use on your own website (and doesn’t require a Git repository).

In addition to these ‘enterprise-quality’ solutions, there are also a number of smaller, more home-grown solutions (though it could be argued that Jekyll is very homegrown). There’s git-wiki, a simple wiki system written in Ruby using Git as the backend. Ikiwiki is a Git or Mercurial based wiki compiler, in that it takes in pages written in a wiki syntax and creates HTML pages. These are viable solutions if you like to have complete control of how your documentation is generated and stored.

Though each of these are great in and of themselves, I still can’t help feeling that there is something missing. In particular, there is lack of a common consensus of how documentation should be created and presented. Some projects have static websites, others have wikis, a few have downloadable PDFs. Equally importantly there isn’t even a moderately common system for creating this documentation. There are all the ways I’ve noted above, which seem to be the most popular. There are also more formal standards like DocBook. Finally lets not forget man and info pages. You can also create your own documentation purely by hand using HTML or LaTex. Contrast this to the way software distribution works (at least in open source): there are binary packages and source tarballs and in many cases some sort of bleeding-edge repository access. There are some exceptions and variations in detail, but by and large things are similar across the board.

Personally, I still can’t make up my mind as to how to manage my own documentation. I like the professional output that LaTex provides and DocBook seems like a well-thought-out standard, but I’d rather not deal with the formatting requirements, especially in documents that can change easily. I really like wikis for ease of use and anywhere editability, but I must be able to save things to my personal repository and I don’t want to host my own wiki server. I’ve previously just written documentation in plain text files and though this is good for just getting the information down, it’s not really something that can be shown to the outside world. For the time being, I’ll be sticking to my plain text files, but I’m seriously considering using Github Pages. For me this offers the dual benefit of easy creation in the form of text files as well having decent online output for whatever I decide to share. I lose the ability to edit from anywhere via the internet, but that’s a price I’m willing to pay. I can still use Google Docs as a emergency temporary staging area. I’m interested in learning how other developers organize their documentation and would gladly hear any advice. There’s a strong chance that my system will change in some way in the future, but that’s true of any system I might adopt.

Refactoring my personal Git repository

People usually talk about refactoring when they’re talking about code. Refactoring generally involves reorganizing and restructuring code so that it maintains the same external functionality, but is better in some non-functional way. In most cases refactoring results in code that is better structured, easier to read and understand and on the whole easier to work with. Now that my exams are over, I decided to undertake a refactoring of my own. But I didn’t refactor code, but rather my entire personal source code repository.

About a year ago I started keeping all my files under version control. I had two Subversion repositories, one for my code and another for non-code related files (mostly school papers). A few months ago I moved from Subversion to Git, but my workflow and repository setup was essentially the same. When I moved to Git, I had a chance to change my repo structure, but I decided to keep it. The single repo would serve as a single storage unit for all my code. Development for each project would take place on separate branches which would be routinely merged back into the main branch. The files in the repo were divided into directories based on the programming language they were written in. Probably not the most scientific classification scheme, but it worked well enough.

Fast forward a few months and things aren’t quite as rosy. It turns out that having everything in one repo isn’t really a good idea after all. The single most significant reason is that the history is a complete mess. Looking back at the log I have changes to my main Python project mixed in with minor Emacs configuration changes that I made as well as any random little experiment that I did and happened to commit. Not very tidy. Secondly, using a separate branch for each project didn’t quite work. I’d often forget which branch I had checked out and start hacking on some random thing. If I was lucky I could switch branches before committing and put the changes where they belonged. If I was unlucky, I was faced with the prospect of moving changes between branches and cleaning up the history. Not something I enjoyed doing. Finally, organization by language wasn’t a good scheme especially since I took a course in programming languages and wanted to save the textbook exercises for each language. The result is that now I have a number of folders with just 2-3 files in them and I won’t be using those languages for a while. More importantly, getting to my important project folders meant digging down 3-4 levels down from my home directory.

I decided last week that things had to change. I needed a new organization system that satisfies the following requirements:

  1. Histories of my main projects are untangled.
  2. My main projects stand out clearly from smaller projects and random experiments.
  3. If I start a small project and it gets bigger it should be easy to give it main project status.
  4. An archive for older projects that I won’t be touching again (or at least not for the foreseeable future).
  5. Some way to keep my schoolwork separate from the other stuff.
  6. Everything is version controlled and I should be able to keep old history.

I’ve used a combination of Git’s repository splitting functionality and good old directory organization to make things cleaner. Everything is still tucked into a top-level src directory, but that’s where the similarities with my old system end. Each major project is moved to its own repo. Since I already had each major project in its own subdirectory, I could use Git’s filter-branch command to cleanly pull them out, while retaining history. Every active project gets its own subdirectory under ~/src which has a working copy of the repo. There is a separate archive subdirectory which contains the most recent copy of the projects that I’ve designed to file away. I technically don’t need this since the repositories are stored on my personal server, but I like having all my code handy.

I pooled together all my experimentation into a single repo called scratch. This also gets its own subdirectory under src. It currently holds a few simple classes I wrote while trying out Scala, some assembly code and a few Prolog files . My schoolwork also gets a separate repo and subdirectory. This contains code for labs in each class as well as textbook exercises (with subdirectories for each class and book). Large projects get their own repo and aren’t part of this schoolwork repo. Since I’m on break they’re all stashed under archive.

The process of actually splitting the repo into the new structure was smooth for the most part. I followed the steps outlined by this Stack Overflow answer to extract each of my main projects into its own repo. I cloned my local repo to create the individual repos but I still had setup remotes for each of them on my server. I followed a really good guide to setup the remotes, but first I had to remove the exiting remotes (which pointed to the local repo which I had cloned from). A simple git remote rm origin took care of that.

Things started to get a little more complicated when it came to extracting things that were spread out (and going into scratch). I wasn’t sure if filter-branch could let me do the kind of fine-tuned extraction and pooling I wanted to do. So I decided instead to create a scratch directory in my existing repo and then make that into a separate repo. I used the same process for extracting code that would go into my schoolwork repo.

The whole process took a little over 2 hours with the third Pirates of the Caribbean movie playing at the same time. I’m considering doing the same thing wiht my non-code repo, though I’ll need to think out a different organization structure for that. Things were made a lot easier and faster by the two guides I found and now that I have a good idea of what needs to be done, I’ll probably have an easier time next time around. I’ve come to learn a little more about the power and flexibility of Git. I’m still think I’m a Git newbie, but at least I know one more trick now. If any of you Git power users have any suggestions or alternate ways to get the same effect, do let me know.