Refactoring my personal Git repository

People usually talk about refactoring when they’re talking about code. Refactoring generally involves reorganizing and restructuring code so that it maintains the same external functionality, but is better in some non-functional way. In most cases refactoring results in code that is better structured, easier to read and understand and on the whole easier to work with. Now that my exams are over, I decided to undertake a refactoring of my own. But I didn’t refactor code, but rather my entire personal source code repository.

About a year ago I started keeping all my files under version control. I had two Subversion repositories, one for my code and another for non-code related files (mostly school papers). A few months ago I moved from Subversion to Git, but my workflow and repository setup was essentially the same. When I moved to Git, I had a chance to change my repo structure, but I decided to keep it. The single repo would serve as a single storage unit for all my code. Development for each project would take place on separate branches which would be routinely merged back into the main branch. The files in the repo were divided into directories based on the programming language they were written in. Probably not the most scientific classification scheme, but it worked well enough.

Fast forward a few months and things aren’t quite as rosy. It turns out that having everything in one repo isn’t really a good idea after all. The single most significant reason is that the history is a complete mess. Looking back at the log I have changes to my main Python project mixed in with minor Emacs configuration changes that I made as well as any random little experiment that I did and happened to commit. Not very tidy. Secondly, using a separate branch for each project didn’t quite work. I’d often forget which branch I had checked out and start hacking on some random thing. If I was lucky I could switch branches before committing and put the changes where they belonged. If I was unlucky, I was faced with the prospect of moving changes between branches and cleaning up the history. Not something I enjoyed doing. Finally, organization by language wasn’t a good scheme especially since I took a course in programming languages and wanted to save the textbook exercises for each language. The result is that now I have a number of folders with just 2-3 files in them and I won’t be using those languages for a while. More importantly, getting to my important project folders meant digging down 3-4 levels down from my home directory.

I decided last week that things had to change. I needed a new organization system that satisfies the following requirements:

  1. Histories of my main projects are untangled.
  2. My main projects stand out clearly from smaller projects and random experiments.
  3. If I start a small project and it gets bigger it should be easy to give it main project status.
  4. An archive for older projects that I won’t be touching again (or at least not for the foreseeable future).
  5. Some way to keep my schoolwork separate from the other stuff.
  6. Everything is version controlled and I should be able to keep old history.

I’ve used a combination of Git’s repository splitting functionality and good old directory organization to make things cleaner. Everything is still tucked into a top-level src directory, but that’s where the similarities with my old system end. Each major project is moved to its own repo. Since I already had each major project in its own subdirectory, I could use Git’s filter-branch command to cleanly pull them out, while retaining history. Every active project gets its own subdirectory under ~/src which has a working copy of the repo. There is a separate archive subdirectory which contains the most recent copy of the projects that I’ve designed to file away. I technically don’t need this since the repositories are stored on my personal server, but I like having all my code handy.

I pooled together all my experimentation into a single repo called scratch. This also gets its own subdirectory under src. It currently holds a few simple classes I wrote while trying out Scala, some assembly code and a few Prolog files . My schoolwork also gets a separate repo and subdirectory. This contains code for labs in each class as well as textbook exercises (with subdirectories for each class and book). Large projects get their own repo and aren’t part of this schoolwork repo. Since I’m on break they’re all stashed under archive.

The process of actually splitting the repo into the new structure was smooth for the most part. I followed the steps outlined by this Stack Overflow answer to extract each of my main projects into its own repo. I cloned my local repo to create the individual repos but I still had setup remotes for each of them on my server. I followed a really good guide to setup the remotes, but first I had to remove the exiting remotes (which pointed to the local repo which I had cloned from). A simple git remote rm origin took care of that.

Things started to get a little more complicated when it came to extracting things that were spread out (and going into scratch). I wasn’t sure if filter-branch could let me do the kind of fine-tuned extraction and pooling I wanted to do. So I decided instead to create a scratch directory in my existing repo and then make that into a separate repo. I used the same process for extracting code that would go into my schoolwork repo.

The whole process took a little over 2 hours with the third Pirates of the Caribbean movie playing at the same time. I’m considering doing the same thing wiht my non-code repo, though I’ll need to think out a different organization structure for that. Things were made a lot easier and faster by the two guides I found and now that I have a good idea of what needs to be done, I’ll probably have an easier time next time around. I’ve come to learn a little more about the power and flexibility of Git. I’m still think I’m a Git newbie, but at least I know one more trick now. If any of you Git power users have any suggestions or alternate ways to get the same effect, do let me know.

One thought on “Refactoring my personal Git repository

  1. Hello Bytebaker!

    I too am a bit of a git noob – but know enough to regularly use it in some projects and realize how powerful and efficient it is. Lately I’ve even setup all my own personal projects in git (on my Dreamhost account).

    I have two Macs I regularly sync everything up on (through Mobile.Me, foxmarks [private server], dropbox, zumodrive, and git). I’ve been thinking of a way of archiving all my photos [about 50gb so far, sometimes I download them on one machine, sometimes the other) and am considering either setting up an rsync script or using subversion – I see no reason to use git instead of this subversion in this case, using git here would seem to just cause me extra work, unless you can think of a reason why not.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s