blob: 62c3b0c294215af3f0f206307b89c00e26e6802b [file] [log] [blame]
David Greaves8ac866a2005-05-22 18:44:16 +01001////////////////////////////////////////////////////////////////
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -07002
Linus Torvaldse83c5162005-04-07 15:13:13 -07003 GIT - the stupid content tracker
4
David Greaves8ac866a2005-05-22 18:44:16 +01005////////////////////////////////////////////////////////////////
Linus Torvaldse83c5162005-04-07 15:13:13 -07006"git" can mean anything, depending on your mood.
7
8 - random three-letter combination that is pronounceable, and not
9 actually used by any common UNIX command. The fact that it is a
Pavel Roskin90c48512005-04-14 23:35:00 -040010 mispronunciation of "get" may or may not be relevant.
Linus Torvaldse83c5162005-04-07 15:13:13 -070011 - stupid. contemptible and despicable. simple. Take your pick from the
12 dictionary of slang.
13 - "global information tracker": you're in a good mood, and it actually
14 works for you. Angels sing, and a light suddenly fills the room.
15 - "goddamn idiotic truckload of sh*t": when it breaks
16
17This is a stupid (but extremely fast) directory content manager. It
Junio C Hamano8db93072005-08-30 13:51:01 -070018doesn't do a whole lot, but what it 'does' do is track directory
Linus Torvaldse83c5162005-04-07 15:13:13 -070019contents efficiently.
20
21There are two object abstractions: the "object database", and the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070022"current directory cache" aka "index".
23
David Greaves8ac866a2005-05-22 18:44:16 +010024The Object Database
25~~~~~~~~~~~~~~~~~~~
Linus Torvaldse83c5162005-04-07 15:13:13 -070026The object database is literally just a content-addressable collection
27of objects. All objects are named by their content, which is
28approximated by the SHA1 hash of the object itself. Objects may refer
David Greaves8ac866a2005-05-22 18:44:16 +010029to other objects (by referencing their SHA1 hash), and so you can
30build up a hierarchy of objects.
Linus Torvaldse83c5162005-04-07 15:13:13 -070031
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070032All objects have a statically determined "type" aka "tag", which is
33determined at object creation time, and which identifies the format of
David Greaves7096a642005-05-22 18:44:17 +010034the object (i.e. how it is used, and how it can refer to other
Junio C Hamanoc4584ae2005-06-27 03:33:33 -070035objects). There are currently four different object types: "blob",
36"tree", "commit" and "tag".
Linus Torvaldse83c5162005-04-07 15:13:13 -070037
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070038A "blob" object cannot refer to any other object, and is, like the tag
39implies, a pure storage object containing some user data. It is used to
Pavel Roskin90c48512005-04-14 23:35:00 -040040actually store the file data, i.e. a blob object is associated with some
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070041particular version of some file.
42
43A "tree" object is an object that ties one or more "blob" objects into a
44directory structure. In addition, a tree object can refer to other tree
45objects, thus creating a directory hierarchy.
46
David Greaves7096a642005-05-22 18:44:17 +010047A "commit" object ties such directory hierarchies together into
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070048a DAG of revisions - each "commit" is associated with exactly one tree
49(the directory hierarchy at the time of the commit). In addition, a
50"commit" refers to one or more "parent" commit objects that describe the
51history of how we arrived at that directory hierarchy.
52
53As a special case, a commit object with no parents is called the "root"
54object, and is the point of an initial project commit. Each project
55must have at least one root, and while you can tie several different
56root objects together into one project by creating a commit object which
57has two or more separate roots as its ultimate parents, that's probably
58just going to confuse people. So aim for the notion of "one root object
59per project", even if git itself does not enforce that.
60
David Greaves8ac866a2005-05-22 18:44:16 +010061A "tag" object symbolically identifies and can be used to sign other
62objects. It contains the identifier and type of another object, a
63symbolic name (of course!) and, optionally, a signature.
64
David Greaves2aef5bb2005-05-22 18:44:17 +010065Regardless of object type, all objects share the following
66characteristics: they are all deflated with zlib, and have a header
67that not only specifies their tag, but also provides size information
68about the data in the object. It's worth noting that the SHA1 hash
Junio C Hamano8db93072005-08-30 13:51:01 -070069that is used to name the object is the hash of the original data
70plus this header, so `sha1sum` 'file' does not match the object name
71for 'file'.
Junio C Hamanoc4584ae2005-06-27 03:33:33 -070072(Historical note: in the dawn of the age of git the hash
Junio C Hamano8db93072005-08-30 13:51:01 -070073was the sha1 of the 'compressed' object.)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070074
75As a result, the general consistency of an object can always be tested
Linus Torvaldse83c5162005-04-07 15:13:13 -070076independently of the contents or the type of the object: all objects can
77be validated by verifying that (a) their hashes match the content of the
78file and (b) the object successfully inflates to a stream of bytes that
79forms a sequence of <ascii tag without space> + <space> + <ascii decimal
80size> + <byte\0> + <binary object data>.
81
David Greaves8ac866a2005-05-22 18:44:16 +010082The structured objects can further have their structure and
83connectivity to other objects verified. This is generally done with
Junio C Hamano8db93072005-08-30 13:51:01 -070084the `git-fsck-cache` program, which generates a full dependency graph
David Greaves7096a642005-05-22 18:44:17 +010085of all objects, and verifies their internal consistency (in addition
86to just verifying their superficial consistency through the hash).
Linus Torvaldse83c5162005-04-07 15:13:13 -070087
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070088The object types in some more detail:
Linus Torvaldse83c5162005-04-07 15:13:13 -070089
David Greaves8ac866a2005-05-22 18:44:16 +010090Blob Object
91~~~~~~~~~~~
92A "blob" object is nothing but a binary blob of data, and doesn't
93refer to anything else. There is no signature or any other
Junio C Hamano8db93072005-08-30 13:51:01 -070094verification of the data, so while the object is consistent (it 'is'
David Greaves8ac866a2005-05-22 18:44:16 +010095indexed by its sha1 hash, so the data itself is certainly correct), it
96has absolutely no other attributes. No name associations, no
97permissions. It is purely a blob of data (i.e. normally "file
98contents").
Linus Torvaldse83c5162005-04-07 15:13:13 -070099
David Greaves8ac866a2005-05-22 18:44:16 +0100100In particular, since the blob is entirely defined by its data, if two
101files in a directory tree (or in multiple different versions of the
102repository) have the same contents, they will share the same blob
Greg Louiscdacb622005-08-17 12:37:04 -0400103object. The object is totally independent of its location in the
David Greaves8ac866a2005-05-22 18:44:16 +0100104directory tree, and renaming a file does not change the object that
105file is associated with in any way.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700106
Bryan Larsen7672db22005-07-08 16:51:55 -0700107A blob is typically created when link:git-update-cache.html[git-update-cache]
Greg Louiscdacb622005-08-17 12:37:04 -0400108is run, and its data can be accessed by link:git-cat-file.html[git-cat-file].
David Greaves7096a642005-05-22 18:44:17 +0100109
David Greaves8ac866a2005-05-22 18:44:16 +0100110Tree Object
111~~~~~~~~~~~
112The next hierarchical object type is the "tree" object. A tree object
113is a list of mode/name/blob data, sorted by name. Alternatively, the
114mode data may specify a directory mode, in which case instead of
115naming a blob, that name is associated with another TREE object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700116
David Greaves8ac866a2005-05-22 18:44:16 +0100117Like the "blob" object, a tree object is uniquely determined by the
118set contents, and so two separate but identical trees will always
119share the exact same object. This is true at all levels, i.e. it's
120true for a "leaf" tree (which does not refer to any other trees, only
121blobs) as well as for a whole subdirectory.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700122
David Greaves8ac866a2005-05-22 18:44:16 +0100123For that reason a "tree" object is just a pure data abstraction: it
124has no history, no signatures, no verification of validity, except
125that since the contents are again protected by the hash itself, we can
126trust that the tree is immutable and its contents never change.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700127
David Greaves8ac866a2005-05-22 18:44:16 +0100128So you can trust the contents of a tree to be valid, the same way you
129can trust the contents of a blob, but you don't know where those
Junio C Hamano8db93072005-08-30 13:51:01 -0700130contents 'came' from.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700131
David Greaves8ac866a2005-05-22 18:44:16 +0100132Side note on trees: since a "tree" object is a sorted list of
133"filename+content", you can create a diff between two trees without
134actually having to unpack two trees. Just ignore all common parts,
135and your diff will look right. In other words, you can effectively
136(and efficiently) tell the difference between any two random trees by
137O(n) where "n" is the size of the difference, rather than the size of
138the tree.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700139
David Greaves8ac866a2005-05-22 18:44:16 +0100140Side note 2 on trees: since the name of a "blob" depends entirely and
141exclusively on its contents (i.e. there are no names or permissions
142involved), you can see trivial renames or permission changes by
143noticing that the blob stayed the same. However, renames with data
144changes need a smarter "diff" implementation.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700145
David Greaves7096a642005-05-22 18:44:17 +0100146A tree is created with link:git-write-tree.html[git-write-tree] and
Junio C Hamano8db93072005-08-30 13:51:01 -0700147its data can be accessed by link:git-ls-tree.html[git-ls-tree].
148Two trees can be compared with link:git-diff-tree.html[git-diff-tree].
Linus Torvaldse83c5162005-04-07 15:13:13 -0700149
David Greaves7096a642005-05-22 18:44:17 +0100150Commit Object
151~~~~~~~~~~~~~
152The "commit" object is an object that introduces the notion of
David Greaves8ac866a2005-05-22 18:44:16 +0100153history into the picture. In contrast to the other objects, it
154doesn't just describe the physical state of a tree, it describes how
155we got there, and why.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700156
David Greaves7096a642005-05-22 18:44:17 +0100157A "commit" is defined by the tree-object that it results in, the
158parent commits (zero, one or more) that led up to that point, and a
159comment on what happened. Again, a commit is not trusted per se:
David Greaves8ac866a2005-05-22 18:44:16 +0100160the contents are well-defined and "safe" due to the cryptographically
161strong signatures at all levels, but there is no reason to believe
162that the tree is "good" or that the merge information makes sense.
163The parents do not have to actually have any relationship with the
164result, for example.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700165
David Greaves7096a642005-05-22 18:44:17 +0100166Note on commits: unlike real SCM's, commits do not contain
Junio C Hamano8db93072005-08-30 13:51:01 -0700167rename information or file mode change information. All of that is
David Greaves8ac866a2005-05-22 18:44:16 +0100168implicit in the trees involved (the result tree, and the result trees
169of the parents), and describing that makes no sense in this idiotic
170file manager.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700171
David Greaves7096a642005-05-22 18:44:17 +0100172A commit is created with link:git-commit-tree.html[git-commit-tree] and
Junio C Hamano8db93072005-08-30 13:51:01 -0700173its data can be accessed by link:git-cat-file.html[git-cat-file].
Linus Torvaldse83c5162005-04-07 15:13:13 -0700174
David Greaves7096a642005-05-22 18:44:17 +0100175Trust
176~~~~~
177An aside on the notion of "trust". Trust is really outside the scope
178of "git", but it's worth noting a few things. First off, since
Junio C Hamano8db93072005-08-30 13:51:01 -0700179everything is hashed with SHA1, you 'can' trust that an object is
David Greaves7096a642005-05-22 18:44:17 +0100180intact and has not been messed with by external sources. So the name
181of an object uniquely identifies a known state - just not a state that
182you may want to trust.
183
184Furthermore, since the SHA1 signature of a commit refers to the
David Greaves8ac866a2005-05-22 18:44:16 +0100185SHA1 signatures of the tree it is associated with and the signatures
David Greaves7096a642005-05-22 18:44:17 +0100186of the parent, a single named commit specifies uniquely a whole set
David Greaves8ac866a2005-05-22 18:44:16 +0100187of history, with full contents. You can't later fake any step of the
David Greaves7096a642005-05-22 18:44:17 +0100188way once you have the name of a commit.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700189
David Greaves8ac866a2005-05-22 18:44:16 +0100190So to introduce some real trust in the system, the only thing you need
Junio C Hamano8db93072005-08-30 13:51:01 -0700191to do is to digitally sign just 'one' special note, which includes the
David Greaves7096a642005-05-22 18:44:17 +0100192name of a top-level commit. Your digital signature shows others
193that you trust that commit, and the immutability of the history of
194commits tells others that they can trust the whole history.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700195
David Greaves8ac866a2005-05-22 18:44:16 +0100196In other words, you can easily validate a whole archive by just
197sending out a single email that tells the people the name (SHA1 hash)
David Greaves7096a642005-05-22 18:44:17 +0100198of the top commit, and digitally sign that email using something
David Greaves8ac866a2005-05-22 18:44:16 +0100199like GPG/PGP.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700200
David Greaves7096a642005-05-22 18:44:17 +0100201To assist in this, git also provides the tag object...
David Greaves8ac866a2005-05-22 18:44:16 +0100202
David Greaves7096a642005-05-22 18:44:17 +0100203Tag Object
204~~~~~~~~~~
205Git provides the "tag" object to simplify creating, managing and
206exchanging symbolic and signed tokens. The "tag" object at its
207simplest simply symbolically identifies another object by containing
208the sha1, type and symbolic name.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700209
David Greaves7096a642005-05-22 18:44:17 +0100210However it can optionally contain additional signature information
211(which git doesn't care about as long as there's less than 8k of
212it). This can then be verified externally to git.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700213
David Greaves7096a642005-05-22 18:44:17 +0100214Note that despite the tag features, "git" itself only handles content
215integrity; the trust framework (and signature provision and
216verification) has to come from outside.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700217
Junio C Hamano8db93072005-08-30 13:51:01 -0700218A tag is created with link:git-mktag.html[git-mktag],
219its data can be accessed by link:git-cat-file.html[git-cat-file],
220and the signature can be verified by
221link:git-verify-tag-script.html[git-verify-tag].
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700222
David Greaves2aef5bb2005-05-22 18:44:17 +0100223
David Greaves8ac866a2005-05-22 18:44:16 +0100224The "index" aka "Current Directory Cache"
225-----------------------------------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700226The index is a simple binary file, which contains an efficient
227representation of a virtual directory content at some random time. It
228does so by a simple array that associates a set of names, dates,
229permissions and content (aka "blob") objects together. The cache is
230always kept ordered by name, and names are unique (with a few very
231specific rules) at any point in time, but the cache has no long-term
David Greaves8ac866a2005-05-22 18:44:16 +0100232meaning, and can be partially updated at any time.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700233
234In particular, the index certainly does not need to be consistent with
235the current directory contents (in fact, most operations will depend on
Junio C Hamano8db93072005-08-30 13:51:01 -0700236different ways to make the index 'not' be consistent with the directory
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700237hierarchy), but it has three very important attributes:
Linus Torvaldse83c5162005-04-07 15:13:13 -0700238
David Greaves8ac866a2005-05-22 18:44:16 +0100239'(a) it can re-generate the full state it caches (not just the
240directory structure: it contains pointers to the "blob" objects so
241that it can regenerate the data too)'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700242
David Greaves8ac866a2005-05-22 18:44:16 +0100243As a special case, there is a clear and unambiguous one-way mapping
244from a current directory cache to a "tree object", which can be
245efficiently created from just the current directory cache without
246actually looking at any other data. So a directory cache at any one
247time uniquely specifies one and only one "tree" object (but has
248additional data to make it easy to match up that tree object with what
249has happened in the directory)
Linus Torvaldse83c5162005-04-07 15:13:13 -0700250
David Greaves8ac866a2005-05-22 18:44:16 +0100251'(b) it has efficient methods for finding inconsistencies between that
252cached state ("tree object waiting to be instantiated") and the
253current state.'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700254
David Greaves8ac866a2005-05-22 18:44:16 +0100255'(c) it can additionally efficiently represent information about merge
256conflicts between different tree objects, allowing each pathname to be
257associated with sufficient information about the trees involved that
258you can create a three-way merge between them.'
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700259
260Those are the three ONLY things that the directory cache does. It's a
Linus Torvaldse83c5162005-04-07 15:13:13 -0700261cache, and the normal operation is to re-generate it completely from a
262known tree object, or update/compare it with a live tree that is being
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700263developed. If you blow the directory cache away entirely, you generally
264haven't lost any information as long as you have the name of the tree
265that it described.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700266
Junio C Hamano8db93072005-08-30 13:51:01 -0700267At the same time, the index is at the same time also the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700268staging area for creating new trees, and creating a new tree always
269involves a controlled modification of the index file. In particular,
270the index file can have the representation of an intermediate tree that
271has not yet been instantiated. So the index can be thought of as a
272write-back cache, which can contain dirty information that has not yet
David Greaves8ac866a2005-05-22 18:44:16 +0100273been written back to the backing store.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700274
275
276
David Greaves8ac866a2005-05-22 18:44:16 +0100277The Workflow
278------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700279Generally, all "git" operations work on the index file. Some operations
David Greaves8ac866a2005-05-22 18:44:16 +0100280work *purely* on the index file (showing the current state of the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700281index), but most operations move data to and from the index file. Either
282from the database or from the working directory. Thus there are four
283main combinations:
284
David Greaves8ac866a2005-05-22 18:44:16 +01002851) working directory -> index
286~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700287
David Greaves8ac866a2005-05-22 18:44:16 +0100288You update the index with information from the working directory with
David Greaves7096a642005-05-22 18:44:17 +0100289the link:git-update-cache.html[git-update-cache] command. You
290generally update the index information by just specifying the filename
291you want to update, like so:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700292
David Greaves7096a642005-05-22 18:44:17 +0100293 git-update-cache filename
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700294
David Greaves8ac866a2005-05-22 18:44:16 +0100295but to avoid common mistakes with filename globbing etc, the command
296will not normally add totally new entries or remove old entries,
297i.e. it will normally just update existing cache entries.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700298
David Greaves8ac866a2005-05-22 18:44:16 +0100299To tell git that yes, you really do realize that certain files no
300longer exist in the archive, or that new files should be added, you
Junio C Hamano8db93072005-08-30 13:51:01 -0700301should use the `--remove` and `--add` flags respectively.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700302
Junio C Hamano8db93072005-08-30 13:51:01 -0700303NOTE! A `--remove` flag does 'not' mean that subsequent filenames will
David Greaves8ac866a2005-05-22 18:44:16 +0100304necessarily be removed: if the files still exist in your directory
305structure, the index will be updated with their new status, not
Junio C Hamano8db93072005-08-30 13:51:01 -0700306removed. The only thing `--remove` means is that update-cache will be
David Greaves8ac866a2005-05-22 18:44:16 +0100307considering a removed file to be a valid thing, and if the file really
308does not exist any more, it will update the index accordingly.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700309
Junio C Hamano8db93072005-08-30 13:51:01 -0700310As a special case, you can also do `git-update-cache --refresh`, which
David Greaves8ac866a2005-05-22 18:44:16 +0100311will refresh the "stat" information of each index to match the current
Junio C Hamano8db93072005-08-30 13:51:01 -0700312stat information. It will 'not' update the object status itself, and
David Greaves8ac866a2005-05-22 18:44:16 +0100313it will only update the fields that are used to quickly test whether
314an object still matches its old backing store object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700315
David Greaves8ac866a2005-05-22 18:44:16 +01003162) index -> object database
317~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700318
David Greaves8ac866a2005-05-22 18:44:16 +0100319You write your current index file to a "tree" object with the program
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700320
David Greaves7096a642005-05-22 18:44:17 +0100321 git-write-tree
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700322
David Greaves8ac866a2005-05-22 18:44:16 +0100323that doesn't come with any options - it will just write out the
324current index into the set of tree objects that describe that state,
325and it will return the name of the resulting top-level tree. You can
326use that tree to re-generate the index at any time by going in the
327other direction:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700328
David Greaves8ac866a2005-05-22 18:44:16 +01003293) object database -> index
330~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700331
David Greaves8ac866a2005-05-22 18:44:16 +0100332You read a "tree" file from the object database, and use that to
333populate (and overwrite - don't do this if your index contains any
334unsaved state that you might want to restore later!) your current
335index. Normal operation is just
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700336
David Greaves7096a642005-05-22 18:44:17 +0100337 git-read-tree <sha1 of tree>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700338
David Greaves8ac866a2005-05-22 18:44:16 +0100339and your index file will now be equivalent to the tree that you saved
Junio C Hamano8db93072005-08-30 13:51:01 -0700340earlier. However, that is only your 'index' file: your working
David Greaves8ac866a2005-05-22 18:44:16 +0100341directory contents have not been modified.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700342
David Greaves8ac866a2005-05-22 18:44:16 +01003434) index -> working directory
344~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700345
David Greaves8ac866a2005-05-22 18:44:16 +0100346You update your working directory from the index by "checking out"
347files. This is not a very common operation, since normally you'd just
348keep your files updated, and rather than write to your working
349directory, you'd tell the index files about the changes in your
Junio C Hamano8db93072005-08-30 13:51:01 -0700350working directory (i.e. `git-update-cache`).
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700351
David Greaves8ac866a2005-05-22 18:44:16 +0100352However, if you decide to jump to a new version, or check out somebody
353else's version, or just restore a previous tree, you'd populate your
354index file with read-tree, and then you need to check out the result
355with
Junio C Hamano8db93072005-08-30 13:51:01 -0700356
David Greaves7096a642005-05-22 18:44:17 +0100357 git-checkout-cache filename
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700358
Junio C Hamano8db93072005-08-30 13:51:01 -0700359or, if you want to check out all of the index, use `-a`.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700360
David Greaves7096a642005-05-22 18:44:17 +0100361NOTE! git-checkout-cache normally refuses to overwrite old files, so
362if you have an old version of the tree already checked out, you will
Junio C Hamano8db93072005-08-30 13:51:01 -0700363need to use the "-f" flag ('before' the "-a" flag or the filename) to
364'force' the checkout.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700365
366
David Greaves8ac866a2005-05-22 18:44:16 +0100367Finally, there are a few odds and ends which are not purely moving
368from one representation to the other:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700369
David Greaves8ac866a2005-05-22 18:44:16 +01003705) Tying it all together
371~~~~~~~~~~~~~~~~~~~~~~~~
David Greaves7096a642005-05-22 18:44:17 +0100372To commit a tree you have instantiated with "git-write-tree", you'd
373create a "commit" object that refers to that tree and the history
374behind it - most notably the "parent" commits that preceded it in
375history.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700376
David Greaves8ac866a2005-05-22 18:44:16 +0100377Normally a "commit" has one parent: the previous state of the tree
378before a certain change was made. However, sometimes it can have two
379or more parent commits, in which case we call it a "merge", due to the
380fact that such a commit brings together ("merges") two or more
381previous states represented by other commits.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700382
David Greaves8ac866a2005-05-22 18:44:16 +0100383In other words, while a "tree" represents a particular directory state
384of a working directory, a "commit" represents that state in "time",
385and explains how we got there.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700386
David Greaves8ac866a2005-05-22 18:44:16 +0100387You create a commit object by giving it the tree that describes the
388state at the time of the commit, and a list of parents:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700389
David Greaves7096a642005-05-22 18:44:17 +0100390 git-commit-tree <tree> -p <parent> [-p <parent2> ..]
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700391
David Greaves8ac866a2005-05-22 18:44:16 +0100392and then giving the reason for the commit on stdin (either through
393redirection from a pipe or file, or by just typing it at the tty).
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700394
David Greaves7096a642005-05-22 18:44:17 +0100395git-commit-tree will return the name of the object that represents
396that commit, and you should save it away for later use. Normally,
Junio C Hamano8db93072005-08-30 13:51:01 -0700397you'd commit a new `HEAD` state, and while git doesn't care where you
David Greaves7096a642005-05-22 18:44:17 +0100398save the note about that state, in practice we tend to just write the
Junio C Hamano8db93072005-08-30 13:51:01 -0700399result to the file `.git/HEAD`, so that we can always see what the
David Greaves8ac866a2005-05-22 18:44:16 +0100400last committed state was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700401
David Greaves8ac866a2005-05-22 18:44:16 +01004026) Examining the data
403~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700404
David Greaves8ac866a2005-05-22 18:44:16 +0100405You can examine the data represented in the object database and the
406index with various helper tools. For every object, you can use
David Greaves7096a642005-05-22 18:44:17 +0100407link:git-cat-file.html[git-cat-file] to examine details about the
408object:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700409
David Greaves7096a642005-05-22 18:44:17 +0100410 git-cat-file -t <objectname>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700411
David Greaves8ac866a2005-05-22 18:44:16 +0100412shows the type of the object, and once you have the type (which is
413usually implicit in where you find the object), you can use
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700414
Junio C Hamano8db93072005-08-30 13:51:01 -0700415 git-cat-file blob|tree|commit|tag <objectname>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700416
David Greaves8ac866a2005-05-22 18:44:16 +0100417to show its contents. NOTE! Trees have binary content, and as a result
David Greaves7096a642005-05-22 18:44:17 +0100418there is a special helper for showing that content, called
Junio C Hamano8db93072005-08-30 13:51:01 -0700419`git-ls-tree`, which turns the binary content into a more easily
David Greaves7096a642005-05-22 18:44:17 +0100420readable form.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700421
David Greaves8ac866a2005-05-22 18:44:16 +0100422It's especially instructive to look at "commit" objects, since those
423tend to be small and fairly self-explanatory. In particular, if you
Junio C Hamano8db93072005-08-30 13:51:01 -0700424follow the convention of having the top commit name in `.git/HEAD`,
David Greaves8ac866a2005-05-22 18:44:16 +0100425you can do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700426
David Greaves7096a642005-05-22 18:44:17 +0100427 git-cat-file commit $(cat .git/HEAD)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700428
David Greaves8ac866a2005-05-22 18:44:16 +0100429to see what the top commit was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700430
David Greaves8ac866a2005-05-22 18:44:16 +01004317) Merging multiple trees
432~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700433
David Greaves8ac866a2005-05-22 18:44:16 +0100434Git helps you do a three-way merge, which you can expand to n-way by
435repeating the merge procedure arbitrary times until you finally
436"commit" the state. The normal situation is that you'd only do one
437three-way merge (two parents), and commit it, but if you like to, you
438can do multiple parents in one go.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700439
David Greaves8ac866a2005-05-22 18:44:16 +0100440To do a three-way merge, you need the two sets of "commit" objects
441that you want to merge, use those to find the closest common parent (a
442third "commit" object), and then use those commit objects to find the
443state of the directory ("tree" object) at these points.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700444
David Greaves8ac866a2005-05-22 18:44:16 +0100445To get the "base" for the merge, you first look up the common parent
446of two commits with
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700447
David Greaves7096a642005-05-22 18:44:17 +0100448 git-merge-base <commit1> <commit2>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700449
David Greaves8ac866a2005-05-22 18:44:16 +0100450which will return you the commit they are both based on. You should
451now look up the "tree" objects of those commits, which you can easily
452do with (for example)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700453
David Greaves7096a642005-05-22 18:44:17 +0100454 git-cat-file commit <commitname> | head -1
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700455
David Greaves8ac866a2005-05-22 18:44:16 +0100456since the tree object information is always the first line in a commit
457object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700458
David Greaves8ac866a2005-05-22 18:44:16 +0100459Once you know the three trees you are going to merge (the one
460"original" tree, aka the common case, and the two "result" trees, aka
461the branches you want to merge), you do a "merge" read into the
Junio C Hamano8db93072005-08-30 13:51:01 -0700462index. This will complain if it has to throw away your old index contents, so you should
David Greaves8ac866a2005-05-22 18:44:16 +0100463make sure that you've committed those - in fact you would normally
464always do a merge against your last commit (which should thus match
465what you have in your current index anyway).
466
467To do the merge, do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700468
Junio C Hamano8db93072005-08-30 13:51:01 -0700469 git-read-tree -m -u <origtree> <yourtree> <targettree>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700470
David Greaves8ac866a2005-05-22 18:44:16 +0100471which will do all trivial merge operations for you directly in the
David Greaves7096a642005-05-22 18:44:17 +0100472index file, and you can just write the result out with
Junio C Hamano8db93072005-08-30 13:51:01 -0700473`git-write-tree`.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700474
Junio C Hamano8db93072005-08-30 13:51:01 -0700475Historical note. We did not have `-u` facility when this
476section was first written, so we used to warn that
477the merge is done in the index file, not in your
478working directory, and your working directory will no longer match your
479index.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700480
Junio C Hamano8db93072005-08-30 13:51:01 -0700481
4828) Merging multiple trees, continued
483~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
484
485Sadly, many merges aren't trivial. If there are files that have
David Greaves8ac866a2005-05-22 18:44:16 +0100486been added.moved or removed, or if both branches have modified the
487same file, you will be left with an index tree that contains "merge
Junio C Hamano8db93072005-08-30 13:51:01 -0700488entries" in it. Such an index tree can 'NOT' be written out to a tree
David Greaves8ac866a2005-05-22 18:44:16 +0100489object, and you will have to resolve any such merge clashes using
490other tools before you can write out the result.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700491
Junio C Hamano8db93072005-08-30 13:51:01 -0700492You can examine such index state with `git-ls-files --unmerged`
493command. An example:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700494
Junio C Hamano8db93072005-08-30 13:51:01 -0700495------------------------------------------------
496$ git-read-tree -m $orig HEAD $target
497$ git-ls-files --unmerged
498100644 263414f423d0e4d70dae8fe53fa34614ff3e2860 1 hello.c
499100644 06fa6a24256dc7e560efa5687fa84b51f0263c3a 2 hello.c
500100644 cc44c73eb783565da5831b4d820c962954019b69 3 hello.c
501------------------------------------------------
502
503Each line of the `git-ls-files --unmerged` output begins with
504the blob mode bits, blob SHA1, 'stage number', and the
505filename. The 'stage number' is git's way to say which tree it
506came from: stage 1 corresponds to `$orig` tree, stage 2 `HEAD`
507tree, and stage3 `$target` tree.
508
509Earlier we said that trivial merges are done inside
510`git-read-tree -m`. For example, if the file did not change
511from `$orig` to `HEAD` nor `$target`, or if the file changed
512from `$orig` to `HEAD` and `$orig` to `$target` the same way,
513obviously the final outcome is what is in `HEAD`. What the
514above example shows is that file `hello.c` was changed from
515`$orig` to `HEAD` and `$orig` to `$target` in a different way.
516You could resolve this by running your favorite 3-way merge
517program, e.g. `diff3` or `merge`, on the blob objects from
518these three stages yourself, like this:
519
520------------------------------------------------
521$ git-cat-file blob 263414f... >hello.c~1
522$ git-cat-file blob 06fa6a2... >hello.c~2
523$ git-cat-file blob cc44c73... >hello.c~3
524$ merge hello.c~2 hello.c~1 hello.c~3
525------------------------------------------------
526
527This would leave the merge result in `hello.c~2` file, along
528with conflict markers if there are conflicts. After verifying
529the merge result makes sense, you can tell git what the final
530merge result for this file is by:
531
532 mv -f hello.c~2 hello.c
533 git-update-cache hello.c
534
535When a path is in unmerged state, running `git-update-cache` for
536that path tells git to mark the path resolved.
537
538The above is the description of a git merge at the lowest level,
539to help you understand what conceptually happens under the hood.
540In practice, nobody, not even git itself, uses three `git-cat-file`
541for this. There is `git-merge-cache` program that extracts the
542stages to temporary files and calls a `merge` script on it
543
544 git-merge-cache git-merge-one-file-script hello.c
545
546and that is what higher level `git resolve` is implemented with.