Git Primer

To obtain a read-only copy of the Starlink git repository:

 % git clone git://starlink.jach.hawaii.edu/starlink.git
 % git clone http://starlink.jach.hawaii.edu/starlink.git

The first option is preferred since it is much faster to use the native protocol than http. In some cases the git protocol causes problems so the alternative is provided. If you have small one-off patches and do not need write access to the repository you can use git-send-email and mail the patch to <stardev AT SPAMFREE jiscmail DOT ac DOT uk>.

To clone a read/write version of the repository you will need to request an account. The repository can be cloned with:

 % git clone ssh://starlink.jach.hawaii.edu/web/starlink/git/starlink.git

To find out what has been changed:

 % git status
 % git diff

After editing, if you want to commit all changes (in the entire repository, not just your current working directory):

 % git commit -a

You will be placed into an editor to enter your commit message. Note that this will not send your work back to the Joint Astronomy Centre. To do that you should first synchronize with the JAC server and then push your changes out:

 % git pull
 % git push

To obtain the history of a particular file:

 % git log --follow -- filename

or browse the repository using gitk.

Each commit is given a unique identifier (an SHA1) and can be used in many commands to indicate a single revision. Only the first few characters are required (usually about 6).

That is enough information to get started. Policies and conventions to use for the Starlink repository itself are discussed elsewhere.

Who are you?

Make sure that git knows who you are before you push any changes:

git config --global user.name "Your Name Comes Here"
git config --global user.email you@yourdomain.example.com

Seeing what changed yesterday

There are a number of ways to see what changed recently.

There is currently no nightly email job indicating recent commits.

Using a remote branch

If you want to use a particular release branch (eg Lehuakona) you do not check it out explicitly when you clone. Instead you clone the main repository and then ask git to track the remote branch.

 % git clone git://starlink.jach.hawaii.edu/starlink.git
 % git branch --track lehuakona origin/lehuakona
 % git checkout lehuakona

Now you have a lehuakona working copy. git pull will update this branch if there are fixes in the remote branch.

You can list all the remote branches:

 % git branch -r
  origin/HEAD
  origin/hokulei
  origin/humu
  origin/keoe
  origin/lehuakona
  origin/master
  origin/puana

Merging vs Rebasing

Merging and rebasing are two important issues in git usage. This is because branching is so easy to use but also because git-pull is actually a combination of git-fetch and git-merge. You may notice that, if you commit a couple of patches on master and then do a git pull to synchronize with the central repository, when you view the resulting history in gitk there is a mini branch. This may seem extremely confusing given that you have patched a routine in KAPPA and yet all you have done is pull an unrelated update to NDF. The problem is that git takes a holistic approach to patches and the entire repository is tracked in one go. This is very different to subversion where you can tweak a subdirectory and commit your patch without ever caring that someone else has patched a different part of the tree.

Git pull is a merge

Let's take a closer look at git-pull. You have a version of the central repository as follows:

  A - B - C

and you decide to do some work.

  A - B - C - D - E

you do a git-push but are informed that there have been changes to the remote and it now looks like this

 A - B - C - X - Y

The critical thing to realise is that the SHA1 commit identifier encodes everything that is required to locate the parents of the commit but not the children. The SHA1 is immutable so can not be modified after it is created. You use the SHA1 to find the parent, and then use the SHA1 of that parent to find its parent all the way up the history. Conversely you can not use a SHA1 to walk down the tree because knowing the SHA1 of the child in the parent would require the parent to be modified and this is not allowed. Understanding this is important. Back to the example, Y knows it is a child of X and X knows it is a child of C. The C in the repository is the same as your C and the X in the repository is public in that many people could be using the repository at state Y and committing to it. They require Y to be immutable so that they can find X which can lead them to C. You have a problem though because D is a child of C:

  A - B - C - X - Y   (remote)
          \ - D - E   (master)

A git-fetch on its own would retrieve X and Y but not merge to your master, this allows you to handle it manually. A git-pull does does a merge since that is the only way to integrate X and Y whilst not changing D and E

  A - B - C - X - Y - M
          \ - D - E /

where M is a special merge commit. You push this back and everyone can see that you did two commits whilst someone else was committing X and Y. This information is not overly useful to the project history. Note also that because X and Y are immutable and relied upon by other people using the repository you can't simply decide to do the pull as

  A - B - C - D - E - X' - Y'

because X would need to be rewritten as X' to indicate that its parent is E.

Git-fetch retrieves the patches from remote but does not merge them. You are then allowed to merge them yourself using git-rebase. Alternatively git-pull has a --rebase option which will reapply your commits (they will get new commit ids because they have new parents. You should end up with something like:

  A - B - C - X - Y - D' - E'

which is more or less what you really wanted to have. Note that D' and E' will have the same datestamp as D and E but will have new SHA1s because they will have new parents.

Branch Often

For very simple patches that are known to be distinct from patches worked on by other people, git-pull --rebase may be sufficient to overcome the problem with diamond commits messing with the history. For any type of meaningful development the recommendation is to always use a development branch and and handle merging/rebasing in the branch before fixing up master. Git is designed to make branching trivial so we should make use of the facility. It's almost no additional work to work on a development branch.

 % git branch dev
 % git checkout dev
 < edits and commits>

Meanwhile origin/master has had some changes so bring them into your repository

 % git checkout master
 % git pull
 % git checkout dev

If you have some uncommitted changes you may want to stash them first, especially if you are worried that the pull will conflict with a local change (changing branches preserves local changes but can cause conflict if that file is changed in both branches), so in long hand that becomes

 % git stash
 % git checkout master
 % git pull
 % git checkout dev
 % git stash apply

Now you have an up to date master and some changes on your dev branch

 A - B - C - D - E            master
          \ - F - G            dev

Since no-one has seen your patches you are free to tweak the SHA1s at this point. Rather than merge dev into master (resulting in a merge commit) you can linearize the history by rebasing

 % git rebase master

will reapply F and G on your branch as if they happened after D and E. This may result in a conflict (but it would have resulted in a conflict when merging) so you can resolve the conflict and use the --continue option to continue the rebasing. You now have

 A - B - C - D - E             master
                 \ - F' - G'   dev

and if you want to push those back to the repository

 % git checkout master
 % git merge dev
 % git push

The merge will not result in a merge commit because git notices that this is a fast forward; HEAD is simply moved to G' and dev and master will share the same HEAD.

  A - B - C - D - E - F' - G'  master+dev

The push will work so long as no-one has committed since you pulled. You can now delete dev since it has no additional information.

 % git branch -d dev

If you keep dev alive it will branch from G'

  A - B - C - D - E - F' - G'
                            \ - H

Further Reading

There are many documents available to help learn git: