Because the Git version control system takes a rather novel approach to how it stores history, I thought I'd write a very quick-and-dirty explanation of that. Once new users understand how this works, a great deal about Git suddenly becomes a lot clearer. So let's see how well this works out. (I'm still touching up this post, so if you see a typo, leave a comment. Or just leave a comment, anyway. And, yes, I can see Drupal is doing silly things with overly long lines; I'm working on that.)
Let's start with a regular checkout (clone) of the mainstream Linux kernel source tree, somewhere in or under your home directory (none of this requires root access):
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
which will, after a while, give you a new directory named
cd into that new directory, and check that you have only a branch named
$ git branch * master $
So far, so good. And here's where it gets interesting.
Git does not store history as the differences or "deltas" between changesets. Rather, every new "state" of the repository is identified by a 40-digit SHA1 checksum, which is a reference to a collection of all of the components that make up that state of that repository.
For instance, right this minute, I have a fully-updated clone of the mainstream kernel source tree, and that state is represented by a checksum I can see with, among other commands,
$ git log commit 06867fbb8abc936192195e5dcc4b63e12cc78f72 ... snip ...
What this tells me is that a full description of the entire repository at this minute is uniquely identified by that 40-digit value: 06867fbb8abc936192195e5dcc4b63e12cc78f72. But what does that value mean? Simple.
Git has a
ls-tree subcommand that allows you to see what that value represents for Git tree-like things and by "tree-like things", I don't mean simply a subdirectory like
arch, but rather a specific state of a directory like
arch. We can ask for the
ls-tree information about our current clone with either of:
$ git ls-tree master $ git ls-tree 06867fbb8abc936192195e5dcc4b63e12cc78f72
which, in my case, gives me:
$ git ls-tree master 100644 blob 57af07cf7e682e77de69d96587b0ca315ea611a1 .gitignore 100644 blob 9b0d0267a3c3f1ea75a674fe858fac2165a8b683 .mailmap 100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING 100644 blob 44fce988eaac8cd22bfe5a5e753ae1bb58b3476d CREDITS 040000 tree 2912a23c4db9ef335ec243fe45e3d1647e9b3469 Documentation 100644 blob b8b708ad6dc3815eb0d23bfea2c972d03b9477c0 Kbuild 100644 blob c13f48d65898487105f0193667648382c90d0eda Kconfig 100644 blob 0e7a80aefa0c27d52e9dec4e2a29d181ff7575ac MAINTAINERS 100644 blob ea51081812f38d5ee8dfaeaab060a9fb4a86ba67 Makefile 100644 blob 0d5a7ddbe3ee8d108bf7079909ddcec30dfb4560 README 100644 blob 55a6074ccbb715d99b642fa510d3c993121f453d REPORTING-BUGS 040000 tree 441a71cc2d8c5ce4e108c26c67aaf52916400fb4 arch 040000 tree c76a4799e7bbc59a28dfbcb3900aab91935b34e7 block 040000 tree c983e8fa79bbaf07c9c4ed500e06ed6f371fa2b9 crypto 040000 tree 71ff18ac162474014e60665b49310c97fa933d93 drivers 040000 tree e30493c735e1dfdd08b48eb6a4f2fcd9c8a0a6fa firmware 040000 tree 5b34106e527f8ab64d38cac8df419528aeda5379 fs 040000 tree 5fe09e8bfc976af195008cde96f215f929456698 include 040000 tree 7aeb965e47f6ce4e62af60302cb8ce84d40164e1 init 040000 tree 8940b1012ece9d3ce666556e746bc1b0deb30a0b ipc 040000 tree e46e01200d1ced07ae062479e4e1e709f1bfebae kernel 040000 tree d25648460ae9a0990f13d791cf0650f8a339a15e lib 040000 tree 66cdde83db9c02546a730348368d3b70ad7ec1aa mm 040000 tree dc81f8b5ce5a5ead0f7c87b123cc90f4b54d9ad0 net 040000 tree a51d84f8c8a1ee2749fd603ba8a7e169936b2b7b samples 040000 tree 9776c9f7ce1b5782efa0159b247d63ba4e219bd7 scripts 040000 tree c38a8b1047565754630b701131b55521ca5ab90d security 040000 tree a16d007119da248ed8752ca38c63631f50ea23ce sound 040000 tree 7c75c99cb655d561c327890a7dcb4b5aca0fbab0 tools 040000 tree 1e3684ddfd7c6b964e6ca2ad6498e3f8b24a7762 usr 040000 tree b9a39d427b5ddb10e8c7c137ef3c81465502ccd4 virt
master is the name of my current (and only) branch. In addition, as you'll see shortly, if you're working with more than one branch, you can refer to your current branch with
HEAD so, since there is currently only the one
master branch, you could have done the following just as easily:
$ git ls-tree HEAD
But what does all of that mean? And this is the point at which one gets enlightenment.
In the above, anything of type
blob represents the current state of a file, while anything of type
tree represents the current state of a directory. In short, the current state of my branch is identified by a single SHA1 checksum, and that unique checksum represents a table of everything in my current directory, with each entry consisting of a Git type and, again, a unique SHA1 checksum which further completely defines those objects and their history.
Let me emphasize that with an actual example. You can run the same command on any subdirectory given its current SHA1 value, such as for the
tools/ directory which, from above, we get the SHA1 value:
$ git ls-tree 7c75c99cb655d561c327890a7dcb4b5aca0fbab0 040000 tree 2e26f5db3439122424dfc869fd6ee301a0427033 firewire 040000 tree e0e441c15192f380eaec8e74ff8fa3abbf857722 hv 040000 tree 671b9e241e6ce933662bc571ce6be59ca90f8569 perf 040000 tree 79cdf858c28055c58b276272b8a3ae1f6d9a3ba1 power 040000 tree 9a18299d98cd48e99faf74f30724bcf893cf99b4 slub 040000 tree bdeb0c783855e54278ee8e3e7b9abd89ebdc52c6 testing 040000 tree 0e22270907013b54e71919f51bcb1349292925bd usb 040000 tree 439c3b825c5bc3496ec77430a711cf6078cd23fb virtio $
And what the above tells us is that not only does the current state of the
tools directory consist of those subdirectories, it also tells us it consists of those precise states of those subdirectories. And there's one more thing that's worth knowing before showing how Git handles changes.
The current SHA1 value of
7c75c99cb655d561c327890a7dcb4b5aca0fbab0 for the
tools/ directory is an absolutely unambiguous representation of its entire state, including all subdirectories. That means that if I had that value on my system and you had the same value on your system, we could be absolutely sure that our respective
tools/ directories were identical to the byte. In other words, those SHA1 values not only identify content and its current state, they encode it so that, apart from astronomically unlikely flukes, two identical SHA1 values in Git will always represent exactly the same content with exactly the same history. But we're not done. Let's see what happens when you make a trivial change and save it.
First, create a new branch you plan on throwing away later:
$ git branch junk $ git branch junk * master $ git checkout junk Switched to branch 'junk' $ git branch * junk master $
So far, so good. And since the new branch is absolutely identical to the original, we can run all of our
git ls-tree commands and we'll get the same results:
$ git ls-tree master ... snip ... $ git ls-tree junk ... snip $
Now let's make and commit a small change in our
junk branch and see what happens.
Edit, say, the top-level
Makefile and change it thusly:
NAME = rday
You can see the change with:
$ git diff diff --git a/Makefile b/Makefile index ea51081..2c9fa0a 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 3 PATCHLEVEL = 2 SUBLEVEL = 0 EXTRAVERSION = -rc7 -NAME = Saber-toothed Squirrel +NAME = rday # *DOCUMENTATION* # To see a list of typical targets execute "make help"
So let's commit the change on our current
junk branch and see what happens.
$ git commit -a $ git log commit 71fbc474bf971b973a6006dbb9091edc4c0f17be Author: Robert P. J. Day Date: Sat Dec 31 10:33:09 2011 -0500 trivial change commit 06867fbb8abc936192195e5dcc4b63e12cc78f72 Merge: 604a16b abb959f Author: Linus Torvalds ... snip ...
From the above, you can see that my
junk branch now has a more recent commit than the
master from which it branched, but how does Git represent that change? You can see by now running
git ls-tree on both branches, and seeing the difference:
$ git ls-tree master ... master ls-tree output ... $ git ls-tree junk ... junk ls-tree output ... $
and if you somehow ran those outputs through the
diff command (or looked really carefully), you'd see that the entire difference between the
junk branches was:
< 100644 blob ea51081812f38d5ee8dfaeaab060a9fb4a86ba67 Makefile --- > 100644 blob 2c9fa0a25469cfb134dbaed89bc4bef5508b63ef Makefile
All of the other entries in the
ls-tree output would be identical between the two branches since there was no difference anywhere else. So if one was handed both the
junk branches with no clue as to where they came from, it would be a simple matter to tell from the
git ls-tree output that both trees were absolutely identical, except for some kind of change in that top-level Makefile.
Questions? Comments? If you followed all that, should I keep going?