blob: b9602a5180f2a1e1be9e1884b44f9f9a9bcce5ef [file] [log] [blame]
David Greaves8ac866a2005-05-22 18:44:16 +01001////////////////////////////////////////////////////////////////
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -07002
Linus Torvaldse83c5162005-04-07 15:13:13 -07003 GIT - the stupid content tracker
4
David Greaves8ac866a2005-05-22 18:44:16 +01005////////////////////////////////////////////////////////////////
Linus Torvaldse83c5162005-04-07 15:13:13 -07006"git" can mean anything, depending on your mood.
7
8 - random three-letter combination that is pronounceable, and not
9 actually used by any common UNIX command. The fact that it is a
Pavel Roskin90c48512005-04-14 23:35:00 -040010 mispronunciation of "get" may or may not be relevant.
Linus Torvaldse83c5162005-04-07 15:13:13 -070011 - stupid. contemptible and despicable. simple. Take your pick from the
12 dictionary of slang.
13 - "global information tracker": you're in a good mood, and it actually
14 works for you. Angels sing, and a light suddenly fills the room.
15 - "goddamn idiotic truckload of sh*t": when it breaks
16
17This is a stupid (but extremely fast) directory content manager. It
18doesn't do a whole lot, but what it _does_ do is track directory
19contents efficiently.
20
21There are two object abstractions: the "object database", and the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070022"current directory cache" aka "index".
23
David Greaves8ac866a2005-05-22 18:44:16 +010024The Object Database
25~~~~~~~~~~~~~~~~~~~
Linus Torvaldse83c5162005-04-07 15:13:13 -070026The object database is literally just a content-addressable collection
27of objects. All objects are named by their content, which is
28approximated by the SHA1 hash of the object itself. Objects may refer
David Greaves8ac866a2005-05-22 18:44:16 +010029to other objects (by referencing their SHA1 hash), and so you can
30build up a hierarchy of objects.
Linus Torvaldse83c5162005-04-07 15:13:13 -070031
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070032All objects have a statically determined "type" aka "tag", which is
33determined at object creation time, and which identifies the format of
David Greaves7096a642005-05-22 18:44:17 +010034the object (i.e. how it is used, and how it can refer to other
35objects). There are currently four different object types: "blob",
36"tree", "commit" and "tag".
Linus Torvaldse83c5162005-04-07 15:13:13 -070037
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070038A "blob" object cannot refer to any other object, and is, like the tag
39implies, a pure storage object containing some user data. It is used to
Pavel Roskin90c48512005-04-14 23:35:00 -040040actually store the file data, i.e. a blob object is associated with some
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070041particular version of some file.
42
43A "tree" object is an object that ties one or more "blob" objects into a
44directory structure. In addition, a tree object can refer to other tree
45objects, thus creating a directory hierarchy.
46
David Greaves7096a642005-05-22 18:44:17 +010047A "commit" object ties such directory hierarchies together into
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070048a DAG of revisions - each "commit" is associated with exactly one tree
49(the directory hierarchy at the time of the commit). In addition, a
50"commit" refers to one or more "parent" commit objects that describe the
51history of how we arrived at that directory hierarchy.
52
53As a special case, a commit object with no parents is called the "root"
54object, and is the point of an initial project commit. Each project
55must have at least one root, and while you can tie several different
56root objects together into one project by creating a commit object which
57has two or more separate roots as its ultimate parents, that's probably
58just going to confuse people. So aim for the notion of "one root object
59per project", even if git itself does not enforce that.
60
David Greaves8ac866a2005-05-22 18:44:16 +010061A "tag" object symbolically identifies and can be used to sign other
62objects. It contains the identifier and type of another object, a
63symbolic name (of course!) and, optionally, a signature.
64
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070065Regardless of object type, all objects are share the following
66characteristics: they are all in deflated with zlib, and have a header
67that not only specifies their tag, but also size information about the
68data in the object. It's worth noting that the SHA1 hash that is used
David Greaves8ac866a2005-05-22 18:44:16 +010069to name the object is the hash of the original data (historical note:
70in the dawn of the age of git this was the sha1 of the _compressed_
71object)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070072
73As a result, the general consistency of an object can always be tested
Linus Torvaldse83c5162005-04-07 15:13:13 -070074independently of the contents or the type of the object: all objects can
75be validated by verifying that (a) their hashes match the content of the
76file and (b) the object successfully inflates to a stream of bytes that
77forms a sequence of <ascii tag without space> + <space> + <ascii decimal
78size> + <byte\0> + <binary object data>.
79
David Greaves8ac866a2005-05-22 18:44:16 +010080The structured objects can further have their structure and
81connectivity to other objects verified. This is generally done with
David Greaves7096a642005-05-22 18:44:17 +010082the "git-fsck-cache" program, which generates a full dependency graph
83of all objects, and verifies their internal consistency (in addition
84to just verifying their superficial consistency through the hash).
Linus Torvaldse83c5162005-04-07 15:13:13 -070085
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -070086The object types in some more detail:
Linus Torvaldse83c5162005-04-07 15:13:13 -070087
David Greaves8ac866a2005-05-22 18:44:16 +010088Blob Object
89~~~~~~~~~~~
90A "blob" object is nothing but a binary blob of data, and doesn't
91refer to anything else. There is no signature or any other
92verification of the data, so while the object is consistent (it _is_
93indexed by its sha1 hash, so the data itself is certainly correct), it
94has absolutely no other attributes. No name associations, no
95permissions. It is purely a blob of data (i.e. normally "file
96contents").
Linus Torvaldse83c5162005-04-07 15:13:13 -070097
David Greaves8ac866a2005-05-22 18:44:16 +010098In particular, since the blob is entirely defined by its data, if two
99files in a directory tree (or in multiple different versions of the
100repository) have the same contents, they will share the same blob
101object. The object is totally independent of it's location in the
102directory tree, and renaming a file does not change the object that
103file is associated with in any way.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700104
David Greaves7096a642005-05-22 18:44:17 +0100105A blob is created with link:git-write-blob.html[git-write-blob] and
106it's data can be accessed by link:git-cat-file.html[git-cat-file]
107
David Greaves8ac866a2005-05-22 18:44:16 +0100108Tree Object
109~~~~~~~~~~~
110The next hierarchical object type is the "tree" object. A tree object
111is a list of mode/name/blob data, sorted by name. Alternatively, the
112mode data may specify a directory mode, in which case instead of
113naming a blob, that name is associated with another TREE object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700114
David Greaves8ac866a2005-05-22 18:44:16 +0100115Like the "blob" object, a tree object is uniquely determined by the
116set contents, and so two separate but identical trees will always
117share the exact same object. This is true at all levels, i.e. it's
118true for a "leaf" tree (which does not refer to any other trees, only
119blobs) as well as for a whole subdirectory.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700120
David Greaves8ac866a2005-05-22 18:44:16 +0100121For that reason a "tree" object is just a pure data abstraction: it
122has no history, no signatures, no verification of validity, except
123that since the contents are again protected by the hash itself, we can
124trust that the tree is immutable and its contents never change.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700125
David Greaves8ac866a2005-05-22 18:44:16 +0100126So you can trust the contents of a tree to be valid, the same way you
127can trust the contents of a blob, but you don't know where those
128contents _came_ from.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700129
David Greaves8ac866a2005-05-22 18:44:16 +0100130Side note on trees: since a "tree" object is a sorted list of
131"filename+content", you can create a diff between two trees without
132actually having to unpack two trees. Just ignore all common parts,
133and your diff will look right. In other words, you can effectively
134(and efficiently) tell the difference between any two random trees by
135O(n) where "n" is the size of the difference, rather than the size of
136the tree.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700137
David Greaves8ac866a2005-05-22 18:44:16 +0100138Side note 2 on trees: since the name of a "blob" depends entirely and
139exclusively on its contents (i.e. there are no names or permissions
140involved), you can see trivial renames or permission changes by
141noticing that the blob stayed the same. However, renames with data
142changes need a smarter "diff" implementation.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700143
David Greaves7096a642005-05-22 18:44:17 +0100144A tree is created with link:git-write-tree.html[git-write-tree] and
145it's data can be accessed by link:git-ls-tree.html[git-ls-tree]
Linus Torvaldse83c5162005-04-07 15:13:13 -0700146
David Greaves7096a642005-05-22 18:44:17 +0100147Commit Object
148~~~~~~~~~~~~~
149The "commit" object is an object that introduces the notion of
David Greaves8ac866a2005-05-22 18:44:16 +0100150history into the picture. In contrast to the other objects, it
151doesn't just describe the physical state of a tree, it describes how
152we got there, and why.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700153
David Greaves7096a642005-05-22 18:44:17 +0100154A "commit" is defined by the tree-object that it results in, the
155parent commits (zero, one or more) that led up to that point, and a
156comment on what happened. Again, a commit is not trusted per se:
David Greaves8ac866a2005-05-22 18:44:16 +0100157the contents are well-defined and "safe" due to the cryptographically
158strong signatures at all levels, but there is no reason to believe
159that the tree is "good" or that the merge information makes sense.
160The parents do not have to actually have any relationship with the
161result, for example.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700162
David Greaves7096a642005-05-22 18:44:17 +0100163Note on commits: unlike real SCM's, commits do not contain
164rename information or file mode chane information. All of that is
David Greaves8ac866a2005-05-22 18:44:16 +0100165implicit in the trees involved (the result tree, and the result trees
166of the parents), and describing that makes no sense in this idiotic
167file manager.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700168
David Greaves7096a642005-05-22 18:44:17 +0100169A commit is created with link:git-commit-tree.html[git-commit-tree] and
170it's data can be accessed by link:git-cat-file.html[git-cat-file]
Linus Torvaldse83c5162005-04-07 15:13:13 -0700171
David Greaves7096a642005-05-22 18:44:17 +0100172Trust
173~~~~~
174An aside on the notion of "trust". Trust is really outside the scope
175of "git", but it's worth noting a few things. First off, since
176everything is hashed with SHA1, you _can_ trust that an object is
177intact and has not been messed with by external sources. So the name
178of an object uniquely identifies a known state - just not a state that
179you may want to trust.
180
181Furthermore, since the SHA1 signature of a commit refers to the
David Greaves8ac866a2005-05-22 18:44:16 +0100182SHA1 signatures of the tree it is associated with and the signatures
David Greaves7096a642005-05-22 18:44:17 +0100183of the parent, a single named commit specifies uniquely a whole set
David Greaves8ac866a2005-05-22 18:44:16 +0100184of history, with full contents. You can't later fake any step of the
David Greaves7096a642005-05-22 18:44:17 +0100185way once you have the name of a commit.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700186
David Greaves8ac866a2005-05-22 18:44:16 +0100187So to introduce some real trust in the system, the only thing you need
188to do is to digitally sign just _one_ special note, which includes the
David Greaves7096a642005-05-22 18:44:17 +0100189name of a top-level commit. Your digital signature shows others
190that you trust that commit, and the immutability of the history of
191commits tells others that they can trust the whole history.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700192
David Greaves8ac866a2005-05-22 18:44:16 +0100193In other words, you can easily validate a whole archive by just
194sending out a single email that tells the people the name (SHA1 hash)
David Greaves7096a642005-05-22 18:44:17 +0100195of the top commit, and digitally sign that email using something
David Greaves8ac866a2005-05-22 18:44:16 +0100196like GPG/PGP.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700197
David Greaves7096a642005-05-22 18:44:17 +0100198To assist in this, git also provides the tag object...
David Greaves8ac866a2005-05-22 18:44:16 +0100199
David Greaves7096a642005-05-22 18:44:17 +0100200Tag Object
201~~~~~~~~~~
202Git provides the "tag" object to simplify creating, managing and
203exchanging symbolic and signed tokens. The "tag" object at its
204simplest simply symbolically identifies another object by containing
205the sha1, type and symbolic name.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700206
David Greaves7096a642005-05-22 18:44:17 +0100207However it can optionally contain additional signature information
208(which git doesn't care about as long as there's less than 8k of
209it). This can then be verified externally to git.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700210
David Greaves7096a642005-05-22 18:44:17 +0100211Note that despite the tag features, "git" itself only handles content
212integrity; the trust framework (and signature provision and
213verification) has to come from outside.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700214
David Greaves7096a642005-05-22 18:44:17 +0100215A tag is created with link:git-mktag.html[git-mktag] and
216it's data can be accessed by link:git-cat-file.html[git-cat-file]
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700217
David Greaves8ac866a2005-05-22 18:44:16 +0100218The "index" aka "Current Directory Cache"
219-----------------------------------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700220The index is a simple binary file, which contains an efficient
221representation of a virtual directory content at some random time. It
222does so by a simple array that associates a set of names, dates,
223permissions and content (aka "blob") objects together. The cache is
224always kept ordered by name, and names are unique (with a few very
225specific rules) at any point in time, but the cache has no long-term
David Greaves8ac866a2005-05-22 18:44:16 +0100226meaning, and can be partially updated at any time.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700227
228In particular, the index certainly does not need to be consistent with
229the current directory contents (in fact, most operations will depend on
230different ways to make the index _not_ be consistent with the directory
231hierarchy), but it has three very important attributes:
Linus Torvaldse83c5162005-04-07 15:13:13 -0700232
David Greaves8ac866a2005-05-22 18:44:16 +0100233'(a) it can re-generate the full state it caches (not just the
234directory structure: it contains pointers to the "blob" objects so
235that it can regenerate the data too)'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700236
David Greaves8ac866a2005-05-22 18:44:16 +0100237As a special case, there is a clear and unambiguous one-way mapping
238from a current directory cache to a "tree object", which can be
239efficiently created from just the current directory cache without
240actually looking at any other data. So a directory cache at any one
241time uniquely specifies one and only one "tree" object (but has
242additional data to make it easy to match up that tree object with what
243has happened in the directory)
Linus Torvaldse83c5162005-04-07 15:13:13 -0700244
David Greaves8ac866a2005-05-22 18:44:16 +0100245'(b) it has efficient methods for finding inconsistencies between that
246cached state ("tree object waiting to be instantiated") and the
247current state.'
Linus Torvaldse83c5162005-04-07 15:13:13 -0700248
David Greaves8ac866a2005-05-22 18:44:16 +0100249'(c) it can additionally efficiently represent information about merge
250conflicts between different tree objects, allowing each pathname to be
251associated with sufficient information about the trees involved that
252you can create a three-way merge between them.'
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700253
254Those are the three ONLY things that the directory cache does. It's a
Linus Torvaldse83c5162005-04-07 15:13:13 -0700255cache, and the normal operation is to re-generate it completely from a
256known tree object, or update/compare it with a live tree that is being
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700257developed. If you blow the directory cache away entirely, you generally
258haven't lost any information as long as you have the name of the tree
259that it described.
Linus Torvaldse83c5162005-04-07 15:13:13 -0700260
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700261At the same time, the directory index is at the same time also the
262staging area for creating new trees, and creating a new tree always
263involves a controlled modification of the index file. In particular,
264the index file can have the representation of an intermediate tree that
265has not yet been instantiated. So the index can be thought of as a
266write-back cache, which can contain dirty information that has not yet
David Greaves8ac866a2005-05-22 18:44:16 +0100267been written back to the backing store.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700268
269
270
David Greaves8ac866a2005-05-22 18:44:16 +0100271The Workflow
272------------
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700273Generally, all "git" operations work on the index file. Some operations
David Greaves8ac866a2005-05-22 18:44:16 +0100274work *purely* on the index file (showing the current state of the
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700275index), but most operations move data to and from the index file. Either
276from the database or from the working directory. Thus there are four
277main combinations:
278
David Greaves8ac866a2005-05-22 18:44:16 +01002791) working directory -> index
280~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700281
David Greaves8ac866a2005-05-22 18:44:16 +0100282You update the index with information from the working directory with
David Greaves7096a642005-05-22 18:44:17 +0100283the link:git-update-cache.html[git-update-cache] command. You
284generally update the index information by just specifying the filename
285you want to update, like so:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700286
David Greaves7096a642005-05-22 18:44:17 +0100287 git-update-cache filename
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700288
David Greaves8ac866a2005-05-22 18:44:16 +0100289but to avoid common mistakes with filename globbing etc, the command
290will not normally add totally new entries or remove old entries,
291i.e. it will normally just update existing cache entries.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700292
David Greaves8ac866a2005-05-22 18:44:16 +0100293To tell git that yes, you really do realize that certain files no
294longer exist in the archive, or that new files should be added, you
295should use the "--remove" and "--add" flags respectively.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700296
David Greaves8ac866a2005-05-22 18:44:16 +0100297NOTE! A "--remove" flag does _not_ mean that subsequent filenames will
298necessarily be removed: if the files still exist in your directory
299structure, the index will be updated with their new status, not
300removed. The only thing "--remove" means is that update-cache will be
301considering a removed file to be a valid thing, and if the file really
302does not exist any more, it will update the index accordingly.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700303
David Greaves7096a642005-05-22 18:44:17 +0100304As a special case, you can also do "git-update-cache --refresh", which
David Greaves8ac866a2005-05-22 18:44:16 +0100305will refresh the "stat" information of each index to match the current
306stat information. It will _not_ update the object status itself, and
307it will only update the fields that are used to quickly test whether
308an object still matches its old backing store object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700309
David Greaves8ac866a2005-05-22 18:44:16 +01003102) index -> object database
311~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700312
David Greaves8ac866a2005-05-22 18:44:16 +0100313You write your current index file to a "tree" object with the program
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700314
David Greaves7096a642005-05-22 18:44:17 +0100315 git-write-tree
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700316
David Greaves8ac866a2005-05-22 18:44:16 +0100317that doesn't come with any options - it will just write out the
318current index into the set of tree objects that describe that state,
319and it will return the name of the resulting top-level tree. You can
320use that tree to re-generate the index at any time by going in the
321other direction:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700322
David Greaves8ac866a2005-05-22 18:44:16 +01003233) object database -> index
324~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700325
David Greaves8ac866a2005-05-22 18:44:16 +0100326You read a "tree" file from the object database, and use that to
327populate (and overwrite - don't do this if your index contains any
328unsaved state that you might want to restore later!) your current
329index. Normal operation is just
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700330
David Greaves7096a642005-05-22 18:44:17 +0100331 git-read-tree <sha1 of tree>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700332
David Greaves8ac866a2005-05-22 18:44:16 +0100333and your index file will now be equivalent to the tree that you saved
334earlier. However, that is only your _index_ file: your working
335directory contents have not been modified.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700336
David Greaves8ac866a2005-05-22 18:44:16 +01003374) index -> working directory
338~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700339
David Greaves8ac866a2005-05-22 18:44:16 +0100340You update your working directory from the index by "checking out"
341files. This is not a very common operation, since normally you'd just
342keep your files updated, and rather than write to your working
343directory, you'd tell the index files about the changes in your
David Greaves7096a642005-05-22 18:44:17 +0100344working directory (i.e. "git-update-cache").
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700345
David Greaves8ac866a2005-05-22 18:44:16 +0100346However, if you decide to jump to a new version, or check out somebody
347else's version, or just restore a previous tree, you'd populate your
348index file with read-tree, and then you need to check out the result
349with
David Greaves7096a642005-05-22 18:44:17 +0100350 git-checkout-cache filename
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700351
David Greaves8ac866a2005-05-22 18:44:16 +0100352or, if you want to check out all of the index, use "-a".
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700353
David Greaves7096a642005-05-22 18:44:17 +0100354NOTE! git-checkout-cache normally refuses to overwrite old files, so
355if you have an old version of the tree already checked out, you will
356need to use the "-f" flag (_before_ the "-a" flag or the filename) to
David Greaves8ac866a2005-05-22 18:44:16 +0100357_force_ the checkout.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700358
359
David Greaves8ac866a2005-05-22 18:44:16 +0100360Finally, there are a few odds and ends which are not purely moving
361from one representation to the other:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700362
David Greaves8ac866a2005-05-22 18:44:16 +01003635) Tying it all together
364~~~~~~~~~~~~~~~~~~~~~~~~
David Greaves7096a642005-05-22 18:44:17 +0100365To commit a tree you have instantiated with "git-write-tree", you'd
366create a "commit" object that refers to that tree and the history
367behind it - most notably the "parent" commits that preceded it in
368history.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700369
David Greaves8ac866a2005-05-22 18:44:16 +0100370Normally a "commit" has one parent: the previous state of the tree
371before a certain change was made. However, sometimes it can have two
372or more parent commits, in which case we call it a "merge", due to the
373fact that such a commit brings together ("merges") two or more
374previous states represented by other commits.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700375
David Greaves8ac866a2005-05-22 18:44:16 +0100376In other words, while a "tree" represents a particular directory state
377of a working directory, a "commit" represents that state in "time",
378and explains how we got there.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700379
David Greaves8ac866a2005-05-22 18:44:16 +0100380You create a commit object by giving it the tree that describes the
381state at the time of the commit, and a list of parents:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700382
David Greaves7096a642005-05-22 18:44:17 +0100383 git-commit-tree <tree> -p <parent> [-p <parent2> ..]
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700384
David Greaves8ac866a2005-05-22 18:44:16 +0100385and then giving the reason for the commit on stdin (either through
386redirection from a pipe or file, or by just typing it at the tty).
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700387
David Greaves7096a642005-05-22 18:44:17 +0100388git-commit-tree will return the name of the object that represents
389that commit, and you should save it away for later use. Normally,
390you'd commit a new "HEAD" state, and while git doesn't care where you
391save the note about that state, in practice we tend to just write the
David Greaves8ac866a2005-05-22 18:44:16 +0100392result to the file ".git/HEAD", so that we can always see what the
393last committed state was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700394
David Greaves8ac866a2005-05-22 18:44:16 +01003956) Examining the data
396~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700397
David Greaves8ac866a2005-05-22 18:44:16 +0100398You can examine the data represented in the object database and the
399index with various helper tools. For every object, you can use
David Greaves7096a642005-05-22 18:44:17 +0100400link:git-cat-file.html[git-cat-file] to examine details about the
401object:
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700402
David Greaves7096a642005-05-22 18:44:17 +0100403 git-cat-file -t <objectname>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700404
David Greaves8ac866a2005-05-22 18:44:16 +0100405shows the type of the object, and once you have the type (which is
406usually implicit in where you find the object), you can use
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700407
David Greaves7096a642005-05-22 18:44:17 +0100408 git-cat-file blob|tree|commit <objectname>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700409
David Greaves8ac866a2005-05-22 18:44:16 +0100410to show its contents. NOTE! Trees have binary content, and as a result
David Greaves7096a642005-05-22 18:44:17 +0100411there is a special helper for showing that content, called
412"git-ls-tree", which turns the binary content into a more easily
413readable form.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700414
David Greaves8ac866a2005-05-22 18:44:16 +0100415It's especially instructive to look at "commit" objects, since those
416tend to be small and fairly self-explanatory. In particular, if you
417follow the convention of having the top commit name in ".git/HEAD",
418you can do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700419
David Greaves7096a642005-05-22 18:44:17 +0100420 git-cat-file commit $(cat .git/HEAD)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700421
David Greaves8ac866a2005-05-22 18:44:16 +0100422to see what the top commit was.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700423
David Greaves8ac866a2005-05-22 18:44:16 +01004247) Merging multiple trees
425~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700426
David Greaves8ac866a2005-05-22 18:44:16 +0100427Git helps you do a three-way merge, which you can expand to n-way by
428repeating the merge procedure arbitrary times until you finally
429"commit" the state. The normal situation is that you'd only do one
430three-way merge (two parents), and commit it, but if you like to, you
431can do multiple parents in one go.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700432
David Greaves8ac866a2005-05-22 18:44:16 +0100433To do a three-way merge, you need the two sets of "commit" objects
434that you want to merge, use those to find the closest common parent (a
435third "commit" object), and then use those commit objects to find the
436state of the directory ("tree" object) at these points.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700437
David Greaves8ac866a2005-05-22 18:44:16 +0100438To get the "base" for the merge, you first look up the common parent
439of two commits with
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700440
David Greaves7096a642005-05-22 18:44:17 +0100441 git-merge-base <commit1> <commit2>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700442
David Greaves8ac866a2005-05-22 18:44:16 +0100443which will return you the commit they are both based on. You should
444now look up the "tree" objects of those commits, which you can easily
445do with (for example)
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700446
David Greaves7096a642005-05-22 18:44:17 +0100447 git-cat-file commit <commitname> | head -1
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700448
David Greaves8ac866a2005-05-22 18:44:16 +0100449since the tree object information is always the first line in a commit
450object.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700451
David Greaves8ac866a2005-05-22 18:44:16 +0100452Once you know the three trees you are going to merge (the one
453"original" tree, aka the common case, and the two "result" trees, aka
454the branches you want to merge), you do a "merge" read into the
455index. This will throw away your old index contents, so you should
456make sure that you've committed those - in fact you would normally
457always do a merge against your last commit (which should thus match
458what you have in your current index anyway).
459
460To do the merge, do
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700461
David Greaves7096a642005-05-22 18:44:17 +0100462 git-read-tree -m <origtree> <target1tree> <target2tree>
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700463
David Greaves8ac866a2005-05-22 18:44:16 +0100464which will do all trivial merge operations for you directly in the
David Greaves7096a642005-05-22 18:44:17 +0100465index file, and you can just write the result out with
466"git-write-tree".
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700467
David Greaves8ac866a2005-05-22 18:44:16 +0100468NOTE! Because the merge is done in the index file, and not in your
469working directory, your working directory will no longer match your
David Greaves7096a642005-05-22 18:44:17 +0100470index. You can use "git-checkout-cache -f -a" to make the effect of
471the merge be seen in your working directory.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700472
David Greaves8ac866a2005-05-22 18:44:16 +0100473NOTE2! Sadly, many merges aren't trivial. If there are files that have
474been added.moved or removed, or if both branches have modified the
475same file, you will be left with an index tree that contains "merge
476entries" in it. Such an index tree can _NOT_ be written out to a tree
477object, and you will have to resolve any such merge clashes using
478other tools before you can write out the result.
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700479
Linus Torvalds6ad6d3d2005-04-17 21:52:23 -0700480
David Greaves8ac866a2005-05-22 18:44:16 +0100481[ fixme: talk about resolving merges here ]