Blame - README - jrn/git

blob: 62c3b0c294215af3f0f206307b89c00e26e6802b [file] [log] [blame]

David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	1	////////////////////////////////////////////////////////////////
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	2
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	3	GIT - the stupid content tracker
				4
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	5	////////////////////////////////////////////////////////////////
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	6	"git" can mean anything, depending on your mood.
				7
				8	- random three-letter combination that is pronounceable, and not
				9	actually used by any common UNIX command. The fact that it is a
Pavel Roskin	90c4851	2005-04-14 23:35:00 -0400	[diff] [blame]	10	mispronunciation of "get" may or may not be relevant.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	11	- stupid. contemptible and despicable. simple. Take your pick from the
				12	dictionary of slang.
				13	- "global information tracker": you're in a good mood, and it actually
				14	works for you. Angels sing, and a light suddenly fills the room.
				15	- "goddamn idiotic truckload of sh*t": when it breaks
				16
				17	This is a stupid (but extremely fast) directory content manager. It
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	18	doesn't do a whole lot, but what it 'does' do is track directory
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	19	contents efficiently.
				20
				21	There are two object abstractions: the "object database", and the
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	22	"current directory cache" aka "index".
				23
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	24	The Object Database
				25	~~~~~~~~~~~~~~~~~~~
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	26	The object database is literally just a content-addressable collection
				27	of objects. All objects are named by their content, which is
				28	approximated by the SHA1 hash of the object itself. Objects may refer
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	29	to other objects (by referencing their SHA1 hash), and so you can
				30	build up a hierarchy of objects.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	31
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	32	All objects have a statically determined "type" aka "tag", which is
				33	determined at object creation time, and which identifies the format of
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	34	the object (i.e. how it is used, and how it can refer to other
Junio C Hamano	c4584ae	2005-06-27 03:33:33 -0700	[diff] [blame]	35	objects). There are currently four different object types: "blob",
				36	"tree", "commit" and "tag".
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	37
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	38	A "blob" object cannot refer to any other object, and is, like the tag
				39	implies, a pure storage object containing some user data. It is used to
Pavel Roskin	90c4851	2005-04-14 23:35:00 -0400	[diff] [blame]	40	actually store the file data, i.e. a blob object is associated with some
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	41	particular version of some file.
				42
				43	A "tree" object is an object that ties one or more "blob" objects into a
				44	directory structure. In addition, a tree object can refer to other tree
				45	objects, thus creating a directory hierarchy.
				46
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	47	A "commit" object ties such directory hierarchies together into
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	48	a DAG of revisions - each "commit" is associated with exactly one tree
				49	(the directory hierarchy at the time of the commit). In addition, a
				50	"commit" refers to one or more "parent" commit objects that describe the
				51	history of how we arrived at that directory hierarchy.
				52
				53	As a special case, a commit object with no parents is called the "root"
				54	object, and is the point of an initial project commit. Each project
				55	must have at least one root, and while you can tie several different
				56	root objects together into one project by creating a commit object which
				57	has two or more separate roots as its ultimate parents, that's probably
				58	just going to confuse people. So aim for the notion of "one root object
				59	per project", even if git itself does not enforce that.
				60
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	61	A "tag" object symbolically identifies and can be used to sign other
				62	objects. It contains the identifier and type of another object, a
				63	symbolic name (of course!) and, optionally, a signature.
				64
David Greaves	2aef5bb	2005-05-22 18:44:17 +0100	[diff] [blame]	65	Regardless of object type, all objects share the following
				66	characteristics: they are all deflated with zlib, and have a header
				67	that not only specifies their tag, but also provides size information
				68	about the data in the object. It's worth noting that the SHA1 hash
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	69	that is used to name the object is the hash of the original data
				70	plus this header, so `sha1sum` 'file' does not match the object name
				71	for 'file'.
Junio C Hamano	c4584ae	2005-06-27 03:33:33 -0700	[diff] [blame]	72	(Historical note: in the dawn of the age of git the hash
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	73	was the sha1 of the 'compressed' object.)
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	74
				75	As a result, the general consistency of an object can always be tested
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	76	independently of the contents or the type of the object: all objects can
				77	be validated by verifying that (a) their hashes match the content of the
				78	file and (b) the object successfully inflates to a stream of bytes that
				79	forms a sequence of <ascii tag without space> + <space> + <ascii decimal
				80	size> + <byte\0> + <binary object data>.
				81
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	82	The structured objects can further have their structure and
				83	connectivity to other objects verified. This is generally done with
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	84	the `git-fsck-cache` program, which generates a full dependency graph
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	85	of all objects, and verifies their internal consistency (in addition
				86	to just verifying their superficial consistency through the hash).
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	87
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	88	The object types in some more detail:
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	89
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	90	Blob Object
				91	~~~~~~~~~~~
				92	A "blob" object is nothing but a binary blob of data, and doesn't
				93	refer to anything else. There is no signature or any other
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	94	verification of the data, so while the object is consistent (it 'is'
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	95	indexed by its sha1 hash, so the data itself is certainly correct), it
				96	has absolutely no other attributes. No name associations, no
				97	permissions. It is purely a blob of data (i.e. normally "file
				98	contents").
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	99
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	100	In particular, since the blob is entirely defined by its data, if two
				101	files in a directory tree (or in multiple different versions of the
				102	repository) have the same contents, they will share the same blob
Greg Louis	cdacb62	2005-08-17 12:37:04 -0400	[diff] [blame]	103	object. The object is totally independent of its location in the
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	104	directory tree, and renaming a file does not change the object that
				105	file is associated with in any way.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	106
Bryan Larsen	7672db2	2005-07-08 16:51:55 -0700	[diff] [blame]	107	A blob is typically created when link:git-update-cache.html[git-update-cache]
Greg Louis	cdacb62	2005-08-17 12:37:04 -0400	[diff] [blame]	108	is run, and its data can be accessed by link:git-cat-file.html[git-cat-file].
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	109
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	110	Tree Object
				111	~~~~~~~~~~~
				112	The next hierarchical object type is the "tree" object. A tree object
				113	is a list of mode/name/blob data, sorted by name. Alternatively, the
				114	mode data may specify a directory mode, in which case instead of
				115	naming a blob, that name is associated with another TREE object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	116
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	117	Like the "blob" object, a tree object is uniquely determined by the
				118	set contents, and so two separate but identical trees will always
				119	share the exact same object. This is true at all levels, i.e. it's
				120	true for a "leaf" tree (which does not refer to any other trees, only
				121	blobs) as well as for a whole subdirectory.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	122
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	123	For that reason a "tree" object is just a pure data abstraction: it
				124	has no history, no signatures, no verification of validity, except
				125	that since the contents are again protected by the hash itself, we can
				126	trust that the tree is immutable and its contents never change.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	127
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	128	So you can trust the contents of a tree to be valid, the same way you
				129	can trust the contents of a blob, but you don't know where those
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	130	contents 'came' from.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	131
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	132	Side note on trees: since a "tree" object is a sorted list of
				133	"filename+content", you can create a diff between two trees without
				134	actually having to unpack two trees. Just ignore all common parts,
				135	and your diff will look right. In other words, you can effectively
				136	(and efficiently) tell the difference between any two random trees by
				137	O(n) where "n" is the size of the difference, rather than the size of
				138	the tree.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	139
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	140	Side note 2 on trees: since the name of a "blob" depends entirely and
				141	exclusively on its contents (i.e. there are no names or permissions
				142	involved), you can see trivial renames or permission changes by
				143	noticing that the blob stayed the same. However, renames with data
				144	changes need a smarter "diff" implementation.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	145
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	146	A tree is created with link:git-write-tree.html[git-write-tree] and
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	147	its data can be accessed by link:git-ls-tree.html[git-ls-tree].
				148	Two trees can be compared with link:git-diff-tree.html[git-diff-tree].
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	149
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	150	Commit Object
				151	~~~~~~~~~~~~~
				152	The "commit" object is an object that introduces the notion of
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	153	history into the picture. In contrast to the other objects, it
				154	doesn't just describe the physical state of a tree, it describes how
				155	we got there, and why.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	156
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	157	A "commit" is defined by the tree-object that it results in, the
				158	parent commits (zero, one or more) that led up to that point, and a
				159	comment on what happened. Again, a commit is not trusted per se:
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	160	the contents are well-defined and "safe" due to the cryptographically
				161	strong signatures at all levels, but there is no reason to believe
				162	that the tree is "good" or that the merge information makes sense.
				163	The parents do not have to actually have any relationship with the
				164	result, for example.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	165
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	166	Note on commits: unlike real SCM's, commits do not contain
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	167	rename information or file mode change information. All of that is
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	168	implicit in the trees involved (the result tree, and the result trees
				169	of the parents), and describing that makes no sense in this idiotic
				170	file manager.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	171
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	172	A commit is created with link:git-commit-tree.html[git-commit-tree] and
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	173	its data can be accessed by link:git-cat-file.html[git-cat-file].
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	174
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	175	Trust
				176	~~~~~
				177	An aside on the notion of "trust". Trust is really outside the scope
				178	of "git", but it's worth noting a few things. First off, since
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	179	everything is hashed with SHA1, you 'can' trust that an object is
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	180	intact and has not been messed with by external sources. So the name
				181	of an object uniquely identifies a known state - just not a state that
				182	you may want to trust.
				183
				184	Furthermore, since the SHA1 signature of a commit refers to the
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	185	SHA1 signatures of the tree it is associated with and the signatures
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	186	of the parent, a single named commit specifies uniquely a whole set
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	187	of history, with full contents. You can't later fake any step of the
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	188	way once you have the name of a commit.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	189
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	190	So to introduce some real trust in the system, the only thing you need
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	191	to do is to digitally sign just 'one' special note, which includes the
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	192	name of a top-level commit. Your digital signature shows others
				193	that you trust that commit, and the immutability of the history of
				194	commits tells others that they can trust the whole history.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	195
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	196	In other words, you can easily validate a whole archive by just
				197	sending out a single email that tells the people the name (SHA1 hash)
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	198	of the top commit, and digitally sign that email using something
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	199	like GPG/PGP.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	200
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	201	To assist in this, git also provides the tag object...
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	202
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	203	Tag Object
				204	~~~~~~~~~~
				205	Git provides the "tag" object to simplify creating, managing and
				206	exchanging symbolic and signed tokens. The "tag" object at its
				207	simplest simply symbolically identifies another object by containing
				208	the sha1, type and symbolic name.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	209
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	210	However it can optionally contain additional signature information
				211	(which git doesn't care about as long as there's less than 8k of
				212	it). This can then be verified externally to git.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	213
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	214	Note that despite the tag features, "git" itself only handles content
				215	integrity; the trust framework (and signature provision and
				216	verification) has to come from outside.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	217
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	218	A tag is created with link:git-mktag.html[git-mktag],
				219	its data can be accessed by link:git-cat-file.html[git-cat-file],
				220	and the signature can be verified by
				221	link:git-verify-tag-script.html[git-verify-tag].
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	222
David Greaves	2aef5bb	2005-05-22 18:44:17 +0100	[diff] [blame]	223
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	224	The "index" aka "Current Directory Cache"
				225	-----------------------------------------
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	226	The index is a simple binary file, which contains an efficient
				227	representation of a virtual directory content at some random time. It
				228	does so by a simple array that associates a set of names, dates,
				229	permissions and content (aka "blob") objects together. The cache is
				230	always kept ordered by name, and names are unique (with a few very
				231	specific rules) at any point in time, but the cache has no long-term
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	232	meaning, and can be partially updated at any time.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	233
				234	In particular, the index certainly does not need to be consistent with
				235	the current directory contents (in fact, most operations will depend on
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	236	different ways to make the index 'not' be consistent with the directory
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	237	hierarchy), but it has three very important attributes:
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	238
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	239	'(a) it can re-generate the full state it caches (not just the
				240	directory structure: it contains pointers to the "blob" objects so
				241	that it can regenerate the data too)'
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	242
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	243	As a special case, there is a clear and unambiguous one-way mapping
				244	from a current directory cache to a "tree object", which can be
				245	efficiently created from just the current directory cache without
				246	actually looking at any other data. So a directory cache at any one
				247	time uniquely specifies one and only one "tree" object (but has
				248	additional data to make it easy to match up that tree object with what
				249	has happened in the directory)
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	250
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	251	'(b) it has efficient methods for finding inconsistencies between that
				252	cached state ("tree object waiting to be instantiated") and the
				253	current state.'
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	254
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	255	'(c) it can additionally efficiently represent information about merge
				256	conflicts between different tree objects, allowing each pathname to be
				257	associated with sufficient information about the trees involved that
				258	you can create a three-way merge between them.'
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	259
				260	Those are the three ONLY things that the directory cache does. It's a
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	261	cache, and the normal operation is to re-generate it completely from a
				262	known tree object, or update/compare it with a live tree that is being
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	263	developed. If you blow the directory cache away entirely, you generally
				264	haven't lost any information as long as you have the name of the tree
				265	that it described.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	266
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	267	At the same time, the index is at the same time also the
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	268	staging area for creating new trees, and creating a new tree always
				269	involves a controlled modification of the index file. In particular,
				270	the index file can have the representation of an intermediate tree that
				271	has not yet been instantiated. So the index can be thought of as a
				272	write-back cache, which can contain dirty information that has not yet
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	273	been written back to the backing store.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	274
				275
				276
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	277	The Workflow
				278	------------
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	279	Generally, all "git" operations work on the index file. Some operations
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	280	work purely on the index file (showing the current state of the
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	281	index), but most operations move data to and from the index file. Either
				282	from the database or from the working directory. Thus there are four
				283	main combinations:
				284
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	285	1) working directory -> index
				286	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	287
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	288	You update the index with information from the working directory with
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	289	the link:git-update-cache.html[git-update-cache] command. You
				290	generally update the index information by just specifying the filename
				291	you want to update, like so:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	292
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	293	git-update-cache filename
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	294
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	295	but to avoid common mistakes with filename globbing etc, the command
				296	will not normally add totally new entries or remove old entries,
				297	i.e. it will normally just update existing cache entries.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	298
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	299	To tell git that yes, you really do realize that certain files no
				300	longer exist in the archive, or that new files should be added, you
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	301	should use the `--remove` and `--add` flags respectively.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	302
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	303	NOTE! A `--remove` flag does 'not' mean that subsequent filenames will
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	304	necessarily be removed: if the files still exist in your directory
				305	structure, the index will be updated with their new status, not
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	306	removed. The only thing `--remove` means is that update-cache will be
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	307	considering a removed file to be a valid thing, and if the file really
				308	does not exist any more, it will update the index accordingly.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	309
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	310	As a special case, you can also do `git-update-cache --refresh`, which
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	311	will refresh the "stat" information of each index to match the current
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	312	stat information. It will 'not' update the object status itself, and
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	313	it will only update the fields that are used to quickly test whether
				314	an object still matches its old backing store object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	315
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	316	2) index -> object database
				317	~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	318
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	319	You write your current index file to a "tree" object with the program
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	320
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	321	git-write-tree
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	322
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	323	that doesn't come with any options - it will just write out the
				324	current index into the set of tree objects that describe that state,
				325	and it will return the name of the resulting top-level tree. You can
				326	use that tree to re-generate the index at any time by going in the
				327	other direction:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	328
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	329	3) object database -> index
				330	~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	331
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	332	You read a "tree" file from the object database, and use that to
				333	populate (and overwrite - don't do this if your index contains any
				334	unsaved state that you might want to restore later!) your current
				335	index. Normal operation is just
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	336
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	337	git-read-tree <sha1 of tree>
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	338
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	339	and your index file will now be equivalent to the tree that you saved
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	340	earlier. However, that is only your 'index' file: your working
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	341	directory contents have not been modified.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	342
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	343	4) index -> working directory
				344	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	345
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	346	You update your working directory from the index by "checking out"
				347	files. This is not a very common operation, since normally you'd just
				348	keep your files updated, and rather than write to your working
				349	directory, you'd tell the index files about the changes in your
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	350	working directory (i.e. `git-update-cache`).
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	351
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	352	However, if you decide to jump to a new version, or check out somebody
				353	else's version, or just restore a previous tree, you'd populate your
				354	index file with read-tree, and then you need to check out the result
				355	with
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	356
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	357	git-checkout-cache filename
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	358
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	359	or, if you want to check out all of the index, use `-a`.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	360
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	361	NOTE! git-checkout-cache normally refuses to overwrite old files, so
				362	if you have an old version of the tree already checked out, you will
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	363	need to use the "-f" flag ('before' the "-a" flag or the filename) to
				364	'force' the checkout.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	365
				366
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	367	Finally, there are a few odds and ends which are not purely moving
				368	from one representation to the other:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	369
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	370	5) Tying it all together
				371	~~~~~~~~~~~~~~~~~~~~~~~~
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	372	To commit a tree you have instantiated with "git-write-tree", you'd
				373	create a "commit" object that refers to that tree and the history
				374	behind it - most notably the "parent" commits that preceded it in
				375	history.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	376
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	377	Normally a "commit" has one parent: the previous state of the tree
				378	before a certain change was made. However, sometimes it can have two
				379	or more parent commits, in which case we call it a "merge", due to the
				380	fact that such a commit brings together ("merges") two or more
				381	previous states represented by other commits.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	382
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	383	In other words, while a "tree" represents a particular directory state
				384	of a working directory, a "commit" represents that state in "time",
				385	and explains how we got there.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	386
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	387	You create a commit object by giving it the tree that describes the
				388	state at the time of the commit, and a list of parents:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	389
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	390	git-commit-tree <tree> -p <parent> [-p <parent2> ..]
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	391
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	392	and then giving the reason for the commit on stdin (either through
				393	redirection from a pipe or file, or by just typing it at the tty).
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	394
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	395	git-commit-tree will return the name of the object that represents
				396	that commit, and you should save it away for later use. Normally,
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	397	you'd commit a new `HEAD` state, and while git doesn't care where you
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	398	save the note about that state, in practice we tend to just write the
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	399	result to the file `.git/HEAD`, so that we can always see what the
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	400	last committed state was.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	401
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	402	6) Examining the data
				403	~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	404
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	405	You can examine the data represented in the object database and the
				406	index with various helper tools. For every object, you can use
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	407	link:git-cat-file.html[git-cat-file] to examine details about the
				408	object:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	409
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	410	git-cat-file -t <objectname>
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	411
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	412	shows the type of the object, and once you have the type (which is
				413	usually implicit in where you find the object), you can use
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	414
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	415	git-cat-file blob\|tree\|commit\|tag <objectname>
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	416
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	417	to show its contents. NOTE! Trees have binary content, and as a result
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	418	there is a special helper for showing that content, called
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	419	`git-ls-tree`, which turns the binary content into a more easily
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	420	readable form.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	421
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	422	It's especially instructive to look at "commit" objects, since those
				423	tend to be small and fairly self-explanatory. In particular, if you
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	424	follow the convention of having the top commit name in `.git/HEAD`,
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	425	you can do
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	426
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	427	git-cat-file commit $(cat .git/HEAD)
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	428
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	429	to see what the top commit was.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	430
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	431	7) Merging multiple trees
				432	~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	433
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	434	Git helps you do a three-way merge, which you can expand to n-way by
				435	repeating the merge procedure arbitrary times until you finally
				436	"commit" the state. The normal situation is that you'd only do one
				437	three-way merge (two parents), and commit it, but if you like to, you
				438	can do multiple parents in one go.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	439
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	440	To do a three-way merge, you need the two sets of "commit" objects
				441	that you want to merge, use those to find the closest common parent (a
				442	third "commit" object), and then use those commit objects to find the
				443	state of the directory ("tree" object) at these points.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	444
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	445	To get the "base" for the merge, you first look up the common parent
				446	of two commits with
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	447
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	448	git-merge-base <commit1> <commit2>
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	449
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	450	which will return you the commit they are both based on. You should
				451	now look up the "tree" objects of those commits, which you can easily
				452	do with (for example)
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	453
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	454	git-cat-file commit <commitname> \| head -1
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	455
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	456	since the tree object information is always the first line in a commit
				457	object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	458
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	459	Once you know the three trees you are going to merge (the one
				460	"original" tree, aka the common case, and the two "result" trees, aka
				461	the branches you want to merge), you do a "merge" read into the
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	462	index. This will complain if it has to throw away your old index contents, so you should
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	463	make sure that you've committed those - in fact you would normally
				464	always do a merge against your last commit (which should thus match
				465	what you have in your current index anyway).
				466
				467	To do the merge, do
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	468
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	469	git-read-tree -m -u <origtree> <yourtree> <targettree>
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	470
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	471	which will do all trivial merge operations for you directly in the
David Greaves	7096a64	2005-05-22 18:44:17 +0100	[diff] [blame]	472	index file, and you can just write the result out with
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	473	`git-write-tree`.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	474
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	475	Historical note. We did not have `-u` facility when this
				476	section was first written, so we used to warn that
				477	the merge is done in the index file, not in your
				478	working directory, and your working directory will no longer match your
				479	index.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	480
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	481
				482	8) Merging multiple trees, continued
				483	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				484
				485	Sadly, many merges aren't trivial. If there are files that have
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	486	been added.moved or removed, or if both branches have modified the
				487	same file, you will be left with an index tree that contains "merge
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	488	entries" in it. Such an index tree can 'NOT' be written out to a tree
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	489	object, and you will have to resolve any such merge clashes using
				490	other tools before you can write out the result.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	491
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	492	You can examine such index state with `git-ls-files --unmerged`
				493	command. An example:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	494
Junio C Hamano	8db9307	2005-08-30 13:51:01 -0700	[diff] [blame]	495	------------------------------------------------
				496	$ git-read-tree -m $orig HEAD $target
				497	$ git-ls-files --unmerged
				498	100644 263414f423d0e4d70dae8fe53fa34614ff3e2860 1 hello.c
				499	100644 06fa6a24256dc7e560efa5687fa84b51f0263c3a 2 hello.c
				500	100644 cc44c73eb783565da5831b4d820c962954019b69 3 hello.c
				501	------------------------------------------------
				502
				503	Each line of the `git-ls-files --unmerged` output begins with
				504	the blob mode bits, blob SHA1, 'stage number', and the
				505	filename. The 'stage number' is git's way to say which tree it
				506	came from: stage 1 corresponds to `$orig` tree, stage 2 `HEAD`
				507	tree, and stage3 `$target` tree.
				508
				509	Earlier we said that trivial merges are done inside
				510	`git-read-tree -m`. For example, if the file did not change
				511	from `$orig` to `HEAD` nor `$target`, or if the file changed
				512	from `$orig` to `HEAD` and `$orig` to `$target` the same way,
				513	obviously the final outcome is what is in `HEAD`. What the
				514	above example shows is that file `hello.c` was changed from
				515	`$orig` to `HEAD` and `$orig` to `$target` in a different way.
				516	You could resolve this by running your favorite 3-way merge
				517	program, e.g. `diff3` or `merge`, on the blob objects from
				518	these three stages yourself, like this:
				519
				520	------------------------------------------------
				521	$ git-cat-file blob 263414f... >hello.c~1
				522	$ git-cat-file blob 06fa6a2... >hello.c~2
				523	$ git-cat-file blob cc44c73... >hello.c~3
				524	$ merge hello.c~2 hello.c~1 hello.c~3
				525	------------------------------------------------
				526
				527	This would leave the merge result in `hello.c~2` file, along
				528	with conflict markers if there are conflicts. After verifying
				529	the merge result makes sense, you can tell git what the final
				530	merge result for this file is by:
				531
				532	mv -f hello.c~2 hello.c
				533	git-update-cache hello.c
				534
				535	When a path is in unmerged state, running `git-update-cache` for
				536	that path tells git to mark the path resolved.
				537
				538	The above is the description of a git merge at the lowest level,
				539	to help you understand what conceptually happens under the hood.
				540	In practice, nobody, not even git itself, uses three `git-cat-file`
				541	for this. There is `git-merge-cache` program that extracts the
				542	stages to temporary files and calls a `merge` script on it
				543
				544	git-merge-cache git-merge-one-file-script hello.c
				545
				546	and that is what higher level `git resolve` is implemented with.