Converting the SVN Repository to GIT

A number of tools were used to aid in the conversion to git. This document explains the decision making involved.

Branches

Distributed Version Control Systems take a global overview of a repository and are not designed to allow a subtree to be modified out of the context of the rest of the tree. A centralized revision control system such as subversion can handle this because all information is stored on the server and it knows that, for example, thirdparty/tclsys is part of a bigger tree. Most of the branches in the subversion repository were subtree branches where something like trunk/thirdparty/xxx/yyy was branched to /branches/yyy/version. A single git repository does not handle this very well, since from the DVCS stand point the branch has involved deletion of 99% of the other files. For this reason it was not feasible to convert the entire repository along with all branches in tact in a useful manner. Most of the tags in the repository were localized tags for individual packages (with unhelpful names) and are not useful in a repository-wide context. So-called 'feature branches' are interesting in that they show how a feature developed but since they usually were merged manually into the trunk there is no useful merge history that needs to be retained in the git version.

There were two different types of branches that were useful and had to be dealt with and one feature branch that is in active development.

  1. Release branches (lehuakona, humu etc)
  2. Vendor branches
  3. gaia-dev (the GAIA-VO development branch)

These will be discussed below.

CVS History

Converting CVS to SVN in 2007 resulted in many special branch creation commits by the cvs2svn command that was used to do that conversion. For historical interest the original CVS repository was converted directly to git using git-cvsimport and is available as starlink-fromoldcvs.git for read-only browsing.

Conversion to Git

Since until recently subversion could not handle true merging and since none of the conversion tools are able to detect branch merging, it was decided to convert subversion trunk/ and the release branches into a single repository and handle the vendor branches as submodules. After examining git-svnimport (oddly deprecated), svn2git and svn-all-fast-export I decided to settle on svn-all-fast-export. There are a number of reasons for this:

  1. The ability to pick out certain parts of the tree and send them to new locations in separate output repositories
  2. It is significantly faster than the others (15 minutes for the whole repository rather than 15 hours)

This was used to convert the primary trunk+release branches and the vendor branches. The only downside of this program (apart from the complete absence of documentation, the dependency on QT4, and a requirement that the author be contacted before it can be made to work) is that .gitignore files are not created from svn:ignore properties.

The command is run as

 % svn-all-fast-export --identity-map=authors.txt configfile /jac_sw/svnroot/starlink

where authors.txt is a file mapping the svn user id to a Name <email> syntax (no equals sign, unlike svn2git and git-svnimport).

The final repository comes out at about 500MB compared to 1.4GB for the subversion repository.

SVN Full History

Since the main conversion was dropping all branch and tag information, I felt it was important to make that information available somehow, so a full conversion was done using git-svnimport, generating .gitignore files.

git svnimport -r -A /Volumes/jachome/authors.txt -I .gitignore -v svn+ssh://malama/jac_sw/svnroot/starlink

Automatic merge detection was disabled. This read only repository is at starlink-allsvn.git

gaia-dev branch

The gaia-dev feature branch was problematic since it was a branch that contained just the gaia and skycat sub-trees. In the end we created a patch set from the git-svnimport import (by using git-format-patch from the branch point), created a proper feature branch from master on the svn-all-fast-export checkout, and then applied the patch set using git-am. This seems to work.