Differences between revisions 13 and 14
Deletions are marked like this. Additions are marked like this.
Line 82: Line 82:
== Merging vs Rebasing ==

Merging and rebasing are two important issues in git usage. This is because branching is so easy to use but also because git-pull is actually a combination of git-fetch and git-merge. You may notice that, if you commit a couple of patches on master and then do a git pull to synchronize with the central repository, when you view the resulting history in gitk there is a mini branch. This may seem extremely confusing given that you have patched a routine in KAPPA and yet all you have done is pull an unrelated update to NDF. The problem is that git takes a holistic approach to patches and the entire repository is tracked in one go. This is very different to subversion where you can tweak a subdirectory and commit your patch without ever caring that someone else has patched a different part of the tree.

=== Git pull is a merge ===

Let's take a closer look at git-pull. You have a version of the central repository as follows:
{{{
  A - B - C
}}}
and you decide to do some work.
{{{
  A - B - C - D - E
}}}
you do a git-push but are informed that there have been changes to the remote and it now looks like this
{{{
 A - B - C - X - Y
}}}
The critical thing to realise is that the SHA1 commit identifier encodes everything that is required to locate the parents of the commit but not the children. The SHA1 is immutable so can not be modified after it is created. You use the SHA1 to find the parent, and then use the SHA1 of that parent to find its parent all the way up the history. Conversely you can not use a SHA1 to walk down the tree because knowing the SHA1 of the child in the parent would require the parent to be modified and this is not allowed. Understanding this is important. Back to the example, Y knows it is a child of X and X knows it is a child of C. The C in the repository is the same as your C and the X in the repository is public in that many people could be using the repository at state Y and committing to it. They require Y to be immutable so that they can find X which can lead them to C. You have a problem though because D is a child of C:
{{{
  A - B - C - X - Y (remote)
          \ - D - E (master)
}}}
A git-fetch on its own would retrieve X and Y but not merge to your master, this allows you to handle it manually. A git-pull does does a merge since that is the only way to integrate X and Y whilst not changing D and E
{{{
  A - B - C - X - Y - M
          \ - D - E /
}}}
where M is a special merge commit. You push this back and everyone can see that you did two commits whilst someone else was committing X and Y. This information is not overly useful to the project history. Note also that because X and Y are immutable and relied upon by other people using the repository you can't simply decide to do the pull as
{{{
  A - B - C - D - E - X' - Y'
}}}
because X would need to be rewritten as X' to indicate that its parent is E.




Git Primer

To obtain a read-only copy of the Starlink git repository:

 % git clone git://starlink.jach.hawaii.edu/starlink.git
 % git clone http://starlink.jach.hawaii.edu/starlink.git

The first option is preferred since it is much faster to use the native protocol than http. In some cases the git protocol causes problems so the alternative is provided. If you have small one-off patches and do not need write access to the repository you can use git-send-email and mail the patch to <stardev AT SPAMFREE jiscmail DOT ac DOT uk>.

To clone a read/write version of the repository you will need to request an account. The repository can be cloned with:

 % git clone ssh://starlink.jach.hawaii.edu/web/starlink/git/starlink.git

To find out what has been changed:

 % git status
 % git diff

After editing, if you want to commit all changes (in the entire repository, not just your current working directory):

 % git commit -a

You will be placed into an editor to enter your commit message. Note that this will not send your work back to the Joint Astronomy Centre. To do that you should first synchronize with the JAC server and then push your changes out:

 % git pull
 % git push

To obtain the history of a particular file:

 % git log --follow -- filename

or browse the repository using gitk.

Each commit is given a unique identifier (an SHA1) and can be used in many commands to indicate a single revision. Only the first few characters are required (usually about 6).

That is enough information to get started. Policies and conventions to use for the Starlink repository itself are discussed elsewhere.

Who are you?

Make sure that git knows who you are before you push any changes:

git config --global user.name "Your Name Comes Here"
git config --global user.email you@yourdomain.example.com

Seeing what changed yesterday

There are a number of ways to see what changed recently.

  • Use the web interface to list the most recent commits

  • Use gitk on your local machine after doing a git-pull to sync with the remote repository

  • Use the RSS feature of the web interface to view recent commits via a RSS new reader

There is currently no nightly email job indicating recent commits.

Using a remote branch

If you want to use a particular release branch (eg Lehuakona) you do not check it out explicitly when you clone. Instead you clone the main repository and then ask git to track the remote branch.

 % git clone git://starlink.jach.hawaii.edu/starlink.git
 % git branch --track lehuakona origin/lehuakona
 % git checkout lehuakona

Now you have a lehuakona working copy. git pull will update this branch if there are fixes in the remote branch.

You can list all the remote branches:

 % git branch -r
  origin/HEAD
  origin/hokulei
  origin/humu
  origin/keoe
  origin/lehuakona
  origin/master
  origin/puana

Merging vs Rebasing

Merging and rebasing are two important issues in git usage. This is because branching is so easy to use but also because git-pull is actually a combination of git-fetch and git-merge. You may notice that, if you commit a couple of patches on master and then do a git pull to synchronize with the central repository, when you view the resulting history in gitk there is a mini branch. This may seem extremely confusing given that you have patched a routine in KAPPA and yet all you have done is pull an unrelated update to NDF. The problem is that git takes a holistic approach to patches and the entire repository is tracked in one go. This is very different to subversion where you can tweak a subdirectory and commit your patch without ever caring that someone else has patched a different part of the tree.

Git pull is a merge

Let's take a closer look at git-pull. You have a version of the central repository as follows:

  A - B - C

and you decide to do some work.

  A - B - C - D - E

you do a git-push but are informed that there have been changes to the remote and it now looks like this

 A - B - C - X - Y

The critical thing to realise is that the SHA1 commit identifier encodes everything that is required to locate the parents of the commit but not the children. The SHA1 is immutable so can not be modified after it is created. You use the SHA1 to find the parent, and then use the SHA1 of that parent to find its parent all the way up the history. Conversely you can not use a SHA1 to walk down the tree because knowing the SHA1 of the child in the parent would require the parent to be modified and this is not allowed. Understanding this is important. Back to the example, Y knows it is a child of X and X knows it is a child of C. The C in the repository is the same as your C and the X in the repository is public in that many people could be using the repository at state Y and committing to it. They require Y to be immutable so that they can find X which can lead them to C. You have a problem though because D is a child of C:

  A - B - C - X - Y   (remote)
          \ - D - E   (master)

A git-fetch on its own would retrieve X and Y but not merge to your master, this allows you to handle it manually. A git-pull does does a merge since that is the only way to integrate X and Y whilst not changing D and E

  A - B - C - X - Y - M
          \ - D - E /

where M is a special merge commit. You push this back and everyone can see that you did two commits whilst someone else was committing X and Y. This information is not overly useful to the project history. Note also that because X and Y are immutable and relied upon by other people using the repository you can't simply decide to do the pull as

  A - B - C - D - E - X' - Y'

because X would need to be rewritten as X' to indicate that its parent is E.

Further Reading

There are many documents available to help learn git:

Starlink: GitPrimer (last edited 2013-03-15 02:29:06 by GrahamBell)