blob: d49aa24f0810ac5a8ad9647460bf07f55788ce47 [file] [log] [blame]
David Greaves8ac866a2005-05-22 18:44:16 +01001////////////////////////////////////////////////////////////////
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -07002
Linus Torvaldse83c5162005-04-07 15:13:13 -07003 GIT - the stupid content tracker
4
David Greaves8ac866a2005-05-22 18:44:16 +01005////////////////////////////////////////////////////////////////
Linus Torvaldse83c5162005-04-07 15:13:13 -07006"git" can mean anything, depending on your mood.
7
8 - random three-letter combination that is pronounceable, and not
9 actually used by any common UNIX command. The fact that it is a
Pavel Roskin90c48512005-04-14 23:35:00 -040010 mispronunciation of "get" may or may not be relevant.
Linus Torvaldse83c5162005-04-07 15:13:13 -070011 - stupid. contemptible and despicable. simple. Take your pick from the
12 dictionary of slang.
13 - "global information tracker": you're in a good mood, and it actually
14 works for you. Angels sing, and a light suddenly fills the room.
15 - "goddamn idiotic truckload of sh*t": when it breaks
16
17This is a stupid (but extremely fast) directory content manager. It
18doesn't do a whole lot, but what it _does_ do is track directory
19contents efficiently.
20
21There are two object abstractions: the "object database", and the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070022"current directory cache" aka "index".
23
David Greaves8ac866a2005-05-22 18:44:16 +010024The Object Database
25~~~~~~~~~~~~~~~~~~~
Linus Torvaldse83c5162005-04-07 15:13:13 -070026The object database is literally just a content-addressable collection
27of objects. All objects are named by their content, which is
28approximated by the SHA1 hash of the object itself. Objects may refer
David Greaves8ac866a2005-05-22 18:44:16 +010029to other objects (by referencing their SHA1 hash), and so you can
30build up a hierarchy of objects.
Linus Torvaldse83c5162005-04-07 15:13:13 -070031
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070032All objects have a statically determined "type" aka "tag", which is
33determined at object creation time, and which identifies the format of
Pavel Roskin90c48512005-04-14 23:35:00 -040034the object (i.e. how it is used, and how it can refer to other objects).
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070035There are currently three different object types: "blob", "tree" and
36"commit".
Linus Torvaldse83c5162005-04-07 15:13:13 -070037
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070038A "blob" object cannot refer to any other object, and is, like the tag
39implies, a pure storage object containing some user data. It is used to
Pavel Roskin90c48512005-04-14 23:35:00 -040040actually store the file data, i.e. a blob object is associated with some
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070041particular version of some file.
42
43A "tree" object is an object that ties one or more "blob" objects into a
44directory structure. In addition, a tree object can refer to other tree
45objects, thus creating a directory hierarchy.
46
47Finally, a "commit" object ties such directory hierarchies together into
48a DAG of revisions - each "commit" is associated with exactly one tree
49(the directory hierarchy at the time of the commit). In addition, a
50"commit" refers to one or more "parent" commit objects that describe the
51history of how we arrived at that directory hierarchy.
52
53As a special case, a commit object with no parents is called the "root"
54object, and is the point of an initial project commit. Each project
55must have at least one root, and while you can tie several different
56root objects together into one project by creating a commit object which
57has two or more separate roots as its ultimate parents, that's probably
58just going to confuse people. So aim for the notion of "one root object
59per project", even if git itself does not enforce that.
60
David Greaves8ac866a2005-05-22 18:44:16 +010061A "tag" object symbolically identifies and can be used to sign other
62objects. It contains the identifier and type of another object, a
63symbolic name (of course!) and, optionally, a signature.
64
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070065Regardless of object type, all objects are share the following
66characteristics: they are all in deflated with zlib, and have a header
67that not only specifies their tag, but also size information about the
68data in the object. It's worth noting that the SHA1 hash that is used
David Greaves8ac866a2005-05-22 18:44:16 +010069to name the object is the hash of the original data (historical note:
70in the dawn of the age of git this was the sha1 of the _compressed_
71object)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070072
73As a result, the general consistency of an object can always be tested
Linus Torvaldse83c5162005-04-07 15:13:13 -070074independently of the contents or the type of the object: all objects can
75be validated by verifying that (a) their hashes match the content of the
76file and (b) the object successfully inflates to a stream of bytes that
77forms a sequence of <ascii tag without space> + <space> + <ascii decimal
78size> + <byte\0> + <binary object data>.
79
David Greaves8ac866a2005-05-22 18:44:16 +010080The structured objects can further have their structure and
81connectivity to other objects verified. This is generally done with
82the "fsck-cache" program, which generates a full dependency graph of
83all objects, and verifies their internal consistency (in addition to
84just verifying their superficial consistency through the hash).
Linus Torvaldse83c5162005-04-07 15:13:13 -070085
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070086The object types in some more detail:
Linus Torvaldse83c5162005-04-07 15:13:13 -070087
David Greaves8ac866a2005-05-22 18:44:16 +010088Blob Object
89~~~~~~~~~~~
90A "blob" object is nothing but a binary blob of data, and doesn't
91refer to anything else. There is no signature or any other
92verification of the data, so while the object is consistent (it _is_
93indexed by its sha1 hash, so the data itself is certainly correct), it
94has absolutely no other attributes. No name associations, no
95permissions. It is purely a blob of data (i.e. normally "file
96contents").
Linus Torvaldse83c5162005-04-07 15:13:13 -070097
David Greaves8ac866a2005-05-22 18:44:16 +010098In particular, since the blob is entirely defined by its data, if two
99files in a directory tree (or in multiple different versions of the
100repository) have the same contents, they will share the same blob
101object. The object is totally independent of it's location in the
102directory tree, and renaming a file does not change the object that
103file is associated with in any way.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700104
David Greaves8ac866a2005-05-22 18:44:16 +0100105Tree Object
106~~~~~~~~~~~
107The next hierarchical object type is the "tree" object. A tree object
108is a list of mode/name/blob data, sorted by name. Alternatively, the
109mode data may specify a directory mode, in which case instead of
110naming a blob, that name is associated with another TREE object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700111
David Greaves8ac866a2005-05-22 18:44:16 +0100112Like the "blob" object, a tree object is uniquely determined by the
113set contents, and so two separate but identical trees will always
114share the exact same object. This is true at all levels, i.e. it's
115true for a "leaf" tree (which does not refer to any other trees, only
116blobs) as well as for a whole subdirectory.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700117
David Greaves8ac866a2005-05-22 18:44:16 +0100118For that reason a "tree" object is just a pure data abstraction: it
119has no history, no signatures, no verification of validity, except
120that since the contents are again protected by the hash itself, we can
121trust that the tree is immutable and its contents never change.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700122
David Greaves8ac866a2005-05-22 18:44:16 +0100123So you can trust the contents of a tree to be valid, the same way you
124can trust the contents of a blob, but you don't know where those
125contents _came_ from.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700126
David Greaves8ac866a2005-05-22 18:44:16 +0100127Side note on trees: since a "tree" object is a sorted list of
128"filename+content", you can create a diff between two trees without
129actually having to unpack two trees. Just ignore all common parts,
130and your diff will look right. In other words, you can effectively
131(and efficiently) tell the difference between any two random trees by
132O(n) where "n" is the size of the difference, rather than the size of
133the tree.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700134
David Greaves8ac866a2005-05-22 18:44:16 +0100135Side note 2 on trees: since the name of a "blob" depends entirely and
136exclusively on its contents (i.e. there are no names or permissions
137involved), you can see trivial renames or permission changes by
138noticing that the blob stayed the same. However, renames with data
139changes need a smarter "diff" implementation.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700140
Linus Torvaldse83c5162005-04-07 15:13:13 -0700141
David Greaves8ac866a2005-05-22 18:44:16 +0100142Changeset Object
143~~~~~~~~~~~~~~~~
144The "changeset" object is an object that introduces the notion of
145history into the picture. In contrast to the other objects, it
146doesn't just describe the physical state of a tree, it describes how
147we got there, and why.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700148
David Greaves8ac866a2005-05-22 18:44:16 +0100149A "changeset" is defined by the tree-object that it results in, the
150parent changesets (zero, one or more) that led up to that point, and a
151comment on what happened. Again, a changeset is not trusted per se:
152the contents are well-defined and "safe" due to the cryptographically
153strong signatures at all levels, but there is no reason to believe
154that the tree is "good" or that the merge information makes sense.
155The parents do not have to actually have any relationship with the
156result, for example.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700157
David Greaves8ac866a2005-05-22 18:44:16 +0100158Note on changesets: unlike real SCM's, changesets do not contain
159rename information or file mode change information. All of that is
160implicit in the trees involved (the result tree, and the result trees
161of the parents), and describing that makes no sense in this idiotic
162file manager.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700163
David Greaves8ac866a2005-05-22 18:44:16 +0100164Trust Object
165~~~~~~~~~~~~
166The notion of "trust" is really outside the scope of "git", but it's
167worth noting a few things. First off, since everything is hashed with
168SHA1, you _can_ trust that an object is intact and has not been messed
169with by external sources. So the name of an object uniquely
170identifies a known state - just not a state that you may want to
171trust.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700172
David Greaves8ac866a2005-05-22 18:44:16 +0100173Furthermore, since the SHA1 signature of a changeset refers to the
174SHA1 signatures of the tree it is associated with and the signatures
175of the parent, a single named changeset specifies uniquely a whole set
176of history, with full contents. You can't later fake any step of the
177way once you have the name of a changeset.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700178
David Greaves8ac866a2005-05-22 18:44:16 +0100179So to introduce some real trust in the system, the only thing you need
180to do is to digitally sign just _one_ special note, which includes the
181name of a top-level changeset. Your digital signature shows others
182that you trust that changeset, and the immutability of the history of
183changesets tells others that they can trust the whole history.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700184
David Greaves8ac866a2005-05-22 18:44:16 +0100185In other words, you can easily validate a whole archive by just
186sending out a single email that tells the people the name (SHA1 hash)
187of the top changeset, and digitally sign that email using something
188like GPG/PGP.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700189
David Greaves8ac866a2005-05-22 18:44:16 +0100190In particular, you can also have a separate archive of "trust points"
191or tags, which document your (and other peoples) trust. You may, of
192course, archive these "certificates of trust" using "git" itself, but
193it's not something "git" does for you.
194
195Another way of saying the last point: "git" itself only handles
196content integrity, the trust has to come from outside.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700197
Linus Torvaldse83c5162005-04-07 15:13:13 -0700198
Linus Torvaldse83c5162005-04-07 15:13:13 -0700199
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700200
David Greaves8ac866a2005-05-22 18:44:16 +0100201The "index" aka "Current Directory Cache"
202-----------------------------------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700203The index is a simple binary file, which contains an efficient
204representation of a virtual directory content at some random time. It
205does so by a simple array that associates a set of names, dates,
206permissions and content (aka "blob") objects together. The cache is
207always kept ordered by name, and names are unique (with a few very
208specific rules) at any point in time, but the cache has no long-term
David Greaves8ac866a2005-05-22 18:44:16 +0100209meaning, and can be partially updated at any time.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700210
211In particular, the index certainly does not need to be consistent with
212the current directory contents (in fact, most operations will depend on
213different ways to make the index _not_ be consistent with the directory
214hierarchy), but it has three very important attributes:
Linus Torvaldse83c5162005-04-07 15:13:13 -0700215
David Greaves8ac866a2005-05-22 18:44:16 +0100216'(a) it can re-generate the full state it caches (not just the
217directory structure: it contains pointers to the "blob" objects so
218that it can regenerate the data too)'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700219
David Greaves8ac866a2005-05-22 18:44:16 +0100220As a special case, there is a clear and unambiguous one-way mapping
221from a current directory cache to a "tree object", which can be
222efficiently created from just the current directory cache without
223actually looking at any other data. So a directory cache at any one
224time uniquely specifies one and only one "tree" object (but has
225additional data to make it easy to match up that tree object with what
226has happened in the directory)
Linus Torvaldse83c5162005-04-07 15:13:13 -0700227
David Greaves8ac866a2005-05-22 18:44:16 +0100228'(b) it has efficient methods for finding inconsistencies between that
229cached state ("tree object waiting to be instantiated") and the
230current state.'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700231
David Greaves8ac866a2005-05-22 18:44:16 +0100232'(c) it can additionally efficiently represent information about merge
233conflicts between different tree objects, allowing each pathname to be
234associated with sufficient information about the trees involved that
235you can create a three-way merge between them.'
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700236
237Those are the three ONLY things that the directory cache does. It's a
Linus Torvaldse83c5162005-04-07 15:13:13 -0700238cache, and the normal operation is to re-generate it completely from a
239known tree object, or update/compare it with a live tree that is being
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700240developed. If you blow the directory cache away entirely, you generally
241haven't lost any information as long as you have the name of the tree
242that it described.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700243
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700244At the same time, the directory index is at the same time also the
245staging area for creating new trees, and creating a new tree always
246involves a controlled modification of the index file. In particular,
247the index file can have the representation of an intermediate tree that
248has not yet been instantiated. So the index can be thought of as a
249write-back cache, which can contain dirty information that has not yet
David Greaves8ac866a2005-05-22 18:44:16 +0100250been written back to the backing store.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700251
252
253
David Greaves8ac866a2005-05-22 18:44:16 +0100254The Workflow
255------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700256Generally, all "git" operations work on the index file. Some operations
David Greaves8ac866a2005-05-22 18:44:16 +0100257work *purely* on the index file (showing the current state of the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700258index), but most operations move data to and from the index file. Either
259from the database or from the working directory. Thus there are four
260main combinations:
261
David Greaves8ac866a2005-05-22 18:44:16 +01002621) working directory -> index
263~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700264
David Greaves8ac866a2005-05-22 18:44:16 +0100265You update the index with information from the working directory with
266the "update-cache" command. You generally update the index
267information by just specifying the filename you want to update, like
268so:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700269
270 update-cache filename
271
David Greaves8ac866a2005-05-22 18:44:16 +0100272but to avoid common mistakes with filename globbing etc, the command
273will not normally add totally new entries or remove old entries,
274i.e. it will normally just update existing cache entries.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700275
David Greaves8ac866a2005-05-22 18:44:16 +0100276To tell git that yes, you really do realize that certain files no
277longer exist in the archive, or that new files should be added, you
278should use the "--remove" and "--add" flags respectively.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700279
David Greaves8ac866a2005-05-22 18:44:16 +0100280NOTE! A "--remove" flag does _not_ mean that subsequent filenames will
281necessarily be removed: if the files still exist in your directory
282structure, the index will be updated with their new status, not
283removed. The only thing "--remove" means is that update-cache will be
284considering a removed file to be a valid thing, and if the file really
285does not exist any more, it will update the index accordingly.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700286
David Greaves8ac866a2005-05-22 18:44:16 +0100287As a special case, you can also do "update-cache --refresh", which
288will refresh the "stat" information of each index to match the current
289stat information. It will _not_ update the object status itself, and
290it will only update the fields that are used to quickly test whether
291an object still matches its old backing store object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700292
David Greaves8ac866a2005-05-22 18:44:16 +01002932) index -> object database
294~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700295
David Greaves8ac866a2005-05-22 18:44:16 +0100296You write your current index file to a "tree" object with the program
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700297
298 write-tree
299
David Greaves8ac866a2005-05-22 18:44:16 +0100300that doesn't come with any options - it will just write out the
301current index into the set of tree objects that describe that state,
302and it will return the name of the resulting top-level tree. You can
303use that tree to re-generate the index at any time by going in the
304other direction:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700305
David Greaves8ac866a2005-05-22 18:44:16 +01003063) object database -> index
307~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700308
David Greaves8ac866a2005-05-22 18:44:16 +0100309You read a "tree" file from the object database, and use that to
310populate (and overwrite - don't do this if your index contains any
311unsaved state that you might want to restore later!) your current
312index. Normal operation is just
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700313
314 read-tree <sha1 of tree>
315
David Greaves8ac866a2005-05-22 18:44:16 +0100316and your index file will now be equivalent to the tree that you saved
317earlier. However, that is only your _index_ file: your working
318directory contents have not been modified.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700319
David Greaves8ac866a2005-05-22 18:44:16 +01003204) index -> working directory
321~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700322
David Greaves8ac866a2005-05-22 18:44:16 +0100323You update your working directory from the index by "checking out"
324files. This is not a very common operation, since normally you'd just
325keep your files updated, and rather than write to your working
326directory, you'd tell the index files about the changes in your
327working directory (i.e. "update-cache").
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700328
David Greaves8ac866a2005-05-22 18:44:16 +0100329However, if you decide to jump to a new version, or check out somebody
330else's version, or just restore a previous tree, you'd populate your
331index file with read-tree, and then you need to check out the result
332with
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700333
334 checkout-cache filename
335
David Greaves8ac866a2005-05-22 18:44:16 +0100336or, if you want to check out all of the index, use "-a".
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700337
David Greaves8ac866a2005-05-22 18:44:16 +0100338NOTE! checkout-cache normally refuses to overwrite old files, so if
339you have an old version of the tree already checked out, you will need
340to use the "-f" flag (_before_ the "-a" flag or the filename) to
341_force_ the checkout.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700342
343
David Greaves8ac866a2005-05-22 18:44:16 +0100344Finally, there are a few odds and ends which are not purely moving
345from one representation to the other:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700346
David Greaves8ac866a2005-05-22 18:44:16 +01003475) Tying it all together
348~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700349
David Greaves8ac866a2005-05-22 18:44:16 +0100350To commit a tree you have instantiated with "write-tree", you'd create
351a "commit" object that refers to that tree and the history behind it -
352most notably the "parent" commits that preceded it in history.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700353
David Greaves8ac866a2005-05-22 18:44:16 +0100354Normally a "commit" has one parent: the previous state of the tree
355before a certain change was made. However, sometimes it can have two
356or more parent commits, in which case we call it a "merge", due to the
357fact that such a commit brings together ("merges") two or more
358previous states represented by other commits.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700359
David Greaves8ac866a2005-05-22 18:44:16 +0100360In other words, while a "tree" represents a particular directory state
361of a working directory, a "commit" represents that state in "time",
362and explains how we got there.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700363
David Greaves8ac866a2005-05-22 18:44:16 +0100364You create a commit object by giving it the tree that describes the
365state at the time of the commit, and a list of parents:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700366
367 commit-tree <tree> -p <parent> [-p <parent2> ..]
368
David Greaves8ac866a2005-05-22 18:44:16 +0100369and then giving the reason for the commit on stdin (either through
370redirection from a pipe or file, or by just typing it at the tty).
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700371
David Greaves8ac866a2005-05-22 18:44:16 +0100372commit-tree will return the name of the object that represents that
373commit, and you should save it away for later use. Normally, you'd
374commit a new "HEAD" state, and while git doesn't care where you save
375the note about that state, in practice we tend to just write the
376result to the file ".git/HEAD", so that we can always see what the
377last committed state was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700378
David Greaves8ac866a2005-05-22 18:44:16 +01003796) Examining the data
380~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700381
David Greaves8ac866a2005-05-22 18:44:16 +0100382You can examine the data represented in the object database and the
383index with various helper tools. For every object, you can use
384"cat-file" to examine details about the object:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700385
386 cat-file -t <objectname>
387
David Greaves8ac866a2005-05-22 18:44:16 +0100388shows the type of the object, and once you have the type (which is
389usually implicit in where you find the object), you can use
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700390
391 cat-file blob|tree|commit <objectname>
392
David Greaves8ac866a2005-05-22 18:44:16 +0100393to show its contents. NOTE! Trees have binary content, and as a result
394there is a special helper for showing that content, called "ls-tree",
395which turns the binary content into a more easily readable form.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700396
David Greaves8ac866a2005-05-22 18:44:16 +0100397It's especially instructive to look at "commit" objects, since those
398tend to be small and fairly self-explanatory. In particular, if you
399follow the convention of having the top commit name in ".git/HEAD",
400you can do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700401
402 cat-file commit $(cat .git/HEAD)
403
David Greaves8ac866a2005-05-22 18:44:16 +0100404to see what the top commit was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700405
David Greaves8ac866a2005-05-22 18:44:16 +01004067) Merging multiple trees
407~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700408
David Greaves8ac866a2005-05-22 18:44:16 +0100409Git helps you do a three-way merge, which you can expand to n-way by
410repeating the merge procedure arbitrary times until you finally
411"commit" the state. The normal situation is that you'd only do one
412three-way merge (two parents), and commit it, but if you like to, you
413can do multiple parents in one go.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700414
David Greaves8ac866a2005-05-22 18:44:16 +0100415To do a three-way merge, you need the two sets of "commit" objects
416that you want to merge, use those to find the closest common parent (a
417third "commit" object), and then use those commit objects to find the
418state of the directory ("tree" object) at these points.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700419
David Greaves8ac866a2005-05-22 18:44:16 +0100420To get the "base" for the merge, you first look up the common parent
421of two commits with
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700422
423 merge-base <commit1> <commit2>
424
David Greaves8ac866a2005-05-22 18:44:16 +0100425which will return you the commit they are both based on. You should
426now look up the "tree" objects of those commits, which you can easily
427do with (for example)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700428
429 cat-file commit <commitname> | head -1
430
David Greaves8ac866a2005-05-22 18:44:16 +0100431since the tree object information is always the first line in a commit
432object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700433
David Greaves8ac866a2005-05-22 18:44:16 +0100434Once you know the three trees you are going to merge (the one
435"original" tree, aka the common case, and the two "result" trees, aka
436the branches you want to merge), you do a "merge" read into the
437index. This will throw away your old index contents, so you should
438make sure that you've committed those - in fact you would normally
439always do a merge against your last commit (which should thus match
440what you have in your current index anyway).
441
442To do the merge, do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700443
444 read-tree -m <origtree> <target1tree> <target2tree>
445
David Greaves8ac866a2005-05-22 18:44:16 +0100446which will do all trivial merge operations for you directly in the
447index file, and you can just write the result out with "write-tree".
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700448
David Greaves8ac866a2005-05-22 18:44:16 +0100449NOTE! Because the merge is done in the index file, and not in your
450working directory, your working directory will no longer match your
451index. You can use "checkout-cache -f -a" to make the effect of the
452merge be seen in your working directory.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700453
David Greaves8ac866a2005-05-22 18:44:16 +0100454NOTE2! Sadly, many merges aren't trivial. If there are files that have
455been added.moved or removed, or if both branches have modified the
456same file, you will be left with an index tree that contains "merge
457entries" in it. Such an index tree can _NOT_ be written out to a tree
458object, and you will have to resolve any such merge clashes using
459other tools before you can write out the result.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700460
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700461
David Greaves8ac866a2005-05-22 18:44:16 +0100462[ fixme: talk about resolving merges here ]