Blame - README - jrn/git

blob: d49aa24f0810ac5a8ad9647460bf07f55788ce47 [file] [log] [blame]

David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	1	////////////////////////////////////////////////////////////////
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	2
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	3	GIT - the stupid content tracker
				4
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	5	////////////////////////////////////////////////////////////////
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	6	"git" can mean anything, depending on your mood.
				7
				8	- random three-letter combination that is pronounceable, and not
				9	actually used by any common UNIX command. The fact that it is a
Pavel Roskin	90c4851	2005-04-14 23:35:00 -0400	[diff] [blame]	10	mispronunciation of "get" may or may not be relevant.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	11	- stupid. contemptible and despicable. simple. Take your pick from the
				12	dictionary of slang.
				13	- "global information tracker": you're in a good mood, and it actually
				14	works for you. Angels sing, and a light suddenly fills the room.
				15	- "goddamn idiotic truckload of sh*t": when it breaks
				16
				17	This is a stupid (but extremely fast) directory content manager. It
				18	doesn't do a whole lot, but what it _does_ do is track directory
				19	contents efficiently.
				20
				21	There are two object abstractions: the "object database", and the
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	22	"current directory cache" aka "index".
				23
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	24	The Object Database
				25	~~~~~~~~~~~~~~~~~~~
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	26	The object database is literally just a content-addressable collection
				27	of objects. All objects are named by their content, which is
				28	approximated by the SHA1 hash of the object itself. Objects may refer
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	29	to other objects (by referencing their SHA1 hash), and so you can
				30	build up a hierarchy of objects.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	31
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	32	All objects have a statically determined "type" aka "tag", which is
				33	determined at object creation time, and which identifies the format of
Pavel Roskin	90c4851	2005-04-14 23:35:00 -0400	[diff] [blame]	34	the object (i.e. how it is used, and how it can refer to other objects).
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	35	There are currently three different object types: "blob", "tree" and
				36	"commit".
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	37
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	38	A "blob" object cannot refer to any other object, and is, like the tag
				39	implies, a pure storage object containing some user data. It is used to
Pavel Roskin	90c4851	2005-04-14 23:35:00 -0400	[diff] [blame]	40	actually store the file data, i.e. a blob object is associated with some
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	41	particular version of some file.
				42
				43	A "tree" object is an object that ties one or more "blob" objects into a
				44	directory structure. In addition, a tree object can refer to other tree
				45	objects, thus creating a directory hierarchy.
				46
				47	Finally, a "commit" object ties such directory hierarchies together into
				48	a DAG of revisions - each "commit" is associated with exactly one tree
				49	(the directory hierarchy at the time of the commit). In addition, a
				50	"commit" refers to one or more "parent" commit objects that describe the
				51	history of how we arrived at that directory hierarchy.
				52
				53	As a special case, a commit object with no parents is called the "root"
				54	object, and is the point of an initial project commit. Each project
				55	must have at least one root, and while you can tie several different
				56	root objects together into one project by creating a commit object which
				57	has two or more separate roots as its ultimate parents, that's probably
				58	just going to confuse people. So aim for the notion of "one root object
				59	per project", even if git itself does not enforce that.
				60
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	61	A "tag" object symbolically identifies and can be used to sign other
				62	objects. It contains the identifier and type of another object, a
				63	symbolic name (of course!) and, optionally, a signature.
				64
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	65	Regardless of object type, all objects are share the following
				66	characteristics: they are all in deflated with zlib, and have a header
				67	that not only specifies their tag, but also size information about the
				68	data in the object. It's worth noting that the SHA1 hash that is used
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	69	to name the object is the hash of the original data (historical note:
				70	in the dawn of the age of git this was the sha1 of the _compressed_
				71	object)
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	72
				73	As a result, the general consistency of an object can always be tested
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	74	independently of the contents or the type of the object: all objects can
				75	be validated by verifying that (a) their hashes match the content of the
				76	file and (b) the object successfully inflates to a stream of bytes that
				77	forms a sequence of <ascii tag without space> + <space> + <ascii decimal
				78	size> + <byte\0> + <binary object data>.
				79
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	80	The structured objects can further have their structure and
				81	connectivity to other objects verified. This is generally done with
				82	the "fsck-cache" program, which generates a full dependency graph of
				83	all objects, and verifies their internal consistency (in addition to
				84	just verifying their superficial consistency through the hash).
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	85
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	86	The object types in some more detail:
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	87
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	88	Blob Object
				89	~~~~~~~~~~~
				90	A "blob" object is nothing but a binary blob of data, and doesn't
				91	refer to anything else. There is no signature or any other
				92	verification of the data, so while the object is consistent (it _is_
				93	indexed by its sha1 hash, so the data itself is certainly correct), it
				94	has absolutely no other attributes. No name associations, no
				95	permissions. It is purely a blob of data (i.e. normally "file
				96	contents").
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	97
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	98	In particular, since the blob is entirely defined by its data, if two
				99	files in a directory tree (or in multiple different versions of the
				100	repository) have the same contents, they will share the same blob
				101	object. The object is totally independent of it's location in the
				102	directory tree, and renaming a file does not change the object that
				103	file is associated with in any way.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	104
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	105	Tree Object
				106	~~~~~~~~~~~
				107	The next hierarchical object type is the "tree" object. A tree object
				108	is a list of mode/name/blob data, sorted by name. Alternatively, the
				109	mode data may specify a directory mode, in which case instead of
				110	naming a blob, that name is associated with another TREE object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	111
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	112	Like the "blob" object, a tree object is uniquely determined by the
				113	set contents, and so two separate but identical trees will always
				114	share the exact same object. This is true at all levels, i.e. it's
				115	true for a "leaf" tree (which does not refer to any other trees, only
				116	blobs) as well as for a whole subdirectory.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	117
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	118	For that reason a "tree" object is just a pure data abstraction: it
				119	has no history, no signatures, no verification of validity, except
				120	that since the contents are again protected by the hash itself, we can
				121	trust that the tree is immutable and its contents never change.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	122
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	123	So you can trust the contents of a tree to be valid, the same way you
				124	can trust the contents of a blob, but you don't know where those
				125	contents _came_ from.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	126
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	127	Side note on trees: since a "tree" object is a sorted list of
				128	"filename+content", you can create a diff between two trees without
				129	actually having to unpack two trees. Just ignore all common parts,
				130	and your diff will look right. In other words, you can effectively
				131	(and efficiently) tell the difference between any two random trees by
				132	O(n) where "n" is the size of the difference, rather than the size of
				133	the tree.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	134
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	135	Side note 2 on trees: since the name of a "blob" depends entirely and
				136	exclusively on its contents (i.e. there are no names or permissions
				137	involved), you can see trivial renames or permission changes by
				138	noticing that the blob stayed the same. However, renames with data
				139	changes need a smarter "diff" implementation.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	140
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	141
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	142	Changeset Object
				143	~~~~~~~~~~~~~~~~
				144	The "changeset" object is an object that introduces the notion of
				145	history into the picture. In contrast to the other objects, it
				146	doesn't just describe the physical state of a tree, it describes how
				147	we got there, and why.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	148
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	149	A "changeset" is defined by the tree-object that it results in, the
				150	parent changesets (zero, one or more) that led up to that point, and a
				151	comment on what happened. Again, a changeset is not trusted per se:
				152	the contents are well-defined and "safe" due to the cryptographically
				153	strong signatures at all levels, but there is no reason to believe
				154	that the tree is "good" or that the merge information makes sense.
				155	The parents do not have to actually have any relationship with the
				156	result, for example.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	157
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	158	Note on changesets: unlike real SCM's, changesets do not contain
				159	rename information or file mode change information. All of that is
				160	implicit in the trees involved (the result tree, and the result trees
				161	of the parents), and describing that makes no sense in this idiotic
				162	file manager.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	163
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	164	Trust Object
				165	~~~~~~~~~~~~
				166	The notion of "trust" is really outside the scope of "git", but it's
				167	worth noting a few things. First off, since everything is hashed with
				168	SHA1, you _can_ trust that an object is intact and has not been messed
				169	with by external sources. So the name of an object uniquely
				170	identifies a known state - just not a state that you may want to
				171	trust.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	172
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	173	Furthermore, since the SHA1 signature of a changeset refers to the
				174	SHA1 signatures of the tree it is associated with and the signatures
				175	of the parent, a single named changeset specifies uniquely a whole set
				176	of history, with full contents. You can't later fake any step of the
				177	way once you have the name of a changeset.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	178
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	179	So to introduce some real trust in the system, the only thing you need
				180	to do is to digitally sign just _one_ special note, which includes the
				181	name of a top-level changeset. Your digital signature shows others
				182	that you trust that changeset, and the immutability of the history of
				183	changesets tells others that they can trust the whole history.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	184
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	185	In other words, you can easily validate a whole archive by just
				186	sending out a single email that tells the people the name (SHA1 hash)
				187	of the top changeset, and digitally sign that email using something
				188	like GPG/PGP.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	189
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	190	In particular, you can also have a separate archive of "trust points"
				191	or tags, which document your (and other peoples) trust. You may, of
				192	course, archive these "certificates of trust" using "git" itself, but
				193	it's not something "git" does for you.
				194
				195	Another way of saying the last point: "git" itself only handles
				196	content integrity, the trust has to come from outside.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	197
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	198
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	199
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	200
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	201	The "index" aka "Current Directory Cache"
				202	-----------------------------------------
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	203	The index is a simple binary file, which contains an efficient
				204	representation of a virtual directory content at some random time. It
				205	does so by a simple array that associates a set of names, dates,
				206	permissions and content (aka "blob") objects together. The cache is
				207	always kept ordered by name, and names are unique (with a few very
				208	specific rules) at any point in time, but the cache has no long-term
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	209	meaning, and can be partially updated at any time.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	210
				211	In particular, the index certainly does not need to be consistent with
				212	the current directory contents (in fact, most operations will depend on
				213	different ways to make the index _not_ be consistent with the directory
				214	hierarchy), but it has three very important attributes:
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	215
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	216	'(a) it can re-generate the full state it caches (not just the
				217	directory structure: it contains pointers to the "blob" objects so
				218	that it can regenerate the data too)'
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	219
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	220	As a special case, there is a clear and unambiguous one-way mapping
				221	from a current directory cache to a "tree object", which can be
				222	efficiently created from just the current directory cache without
				223	actually looking at any other data. So a directory cache at any one
				224	time uniquely specifies one and only one "tree" object (but has
				225	additional data to make it easy to match up that tree object with what
				226	has happened in the directory)
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	227
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	228	'(b) it has efficient methods for finding inconsistencies between that
				229	cached state ("tree object waiting to be instantiated") and the
				230	current state.'
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	231
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	232	'(c) it can additionally efficiently represent information about merge
				233	conflicts between different tree objects, allowing each pathname to be
				234	associated with sufficient information about the trees involved that
				235	you can create a three-way merge between them.'
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	236
				237	Those are the three ONLY things that the directory cache does. It's a
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	238	cache, and the normal operation is to re-generate it completely from a
				239	known tree object, or update/compare it with a live tree that is being
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	240	developed. If you blow the directory cache away entirely, you generally
				241	haven't lost any information as long as you have the name of the tree
				242	that it described.
Linus Torvalds	e83c516	2005-04-07 15:13:13 -0700	[diff] [blame]	243
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	244	At the same time, the directory index is at the same time also the
				245	staging area for creating new trees, and creating a new tree always
				246	involves a controlled modification of the index file. In particular,
				247	the index file can have the representation of an intermediate tree that
				248	has not yet been instantiated. So the index can be thought of as a
				249	write-back cache, which can contain dirty information that has not yet
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	250	been written back to the backing store.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	251
				252
				253
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	254	The Workflow
				255	------------
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	256	Generally, all "git" operations work on the index file. Some operations
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	257	work purely on the index file (showing the current state of the
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	258	index), but most operations move data to and from the index file. Either
				259	from the database or from the working directory. Thus there are four
				260	main combinations:
				261
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	262	1) working directory -> index
				263	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	264
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	265	You update the index with information from the working directory with
				266	the "update-cache" command. You generally update the index
				267	information by just specifying the filename you want to update, like
				268	so:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	269
				270	update-cache filename
				271
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	272	but to avoid common mistakes with filename globbing etc, the command
				273	will not normally add totally new entries or remove old entries,
				274	i.e. it will normally just update existing cache entries.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	275
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	276	To tell git that yes, you really do realize that certain files no
				277	longer exist in the archive, or that new files should be added, you
				278	should use the "--remove" and "--add" flags respectively.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	279
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	280	NOTE! A "--remove" flag does _not_ mean that subsequent filenames will
				281	necessarily be removed: if the files still exist in your directory
				282	structure, the index will be updated with their new status, not
				283	removed. The only thing "--remove" means is that update-cache will be
				284	considering a removed file to be a valid thing, and if the file really
				285	does not exist any more, it will update the index accordingly.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	286
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	287	As a special case, you can also do "update-cache --refresh", which
				288	will refresh the "stat" information of each index to match the current
				289	stat information. It will _not_ update the object status itself, and
				290	it will only update the fields that are used to quickly test whether
				291	an object still matches its old backing store object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	292
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	293	2) index -> object database
				294	~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	295
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	296	You write your current index file to a "tree" object with the program
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	297
				298	write-tree
				299
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	300	that doesn't come with any options - it will just write out the
				301	current index into the set of tree objects that describe that state,
				302	and it will return the name of the resulting top-level tree. You can
				303	use that tree to re-generate the index at any time by going in the
				304	other direction:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	305
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	306	3) object database -> index
				307	~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	308
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	309	You read a "tree" file from the object database, and use that to
				310	populate (and overwrite - don't do this if your index contains any
				311	unsaved state that you might want to restore later!) your current
				312	index. Normal operation is just
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	313
				314	read-tree <sha1 of tree>
				315
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	316	and your index file will now be equivalent to the tree that you saved
				317	earlier. However, that is only your _index_ file: your working
				318	directory contents have not been modified.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	319
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	320	4) index -> working directory
				321	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	322
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	323	You update your working directory from the index by "checking out"
				324	files. This is not a very common operation, since normally you'd just
				325	keep your files updated, and rather than write to your working
				326	directory, you'd tell the index files about the changes in your
				327	working directory (i.e. "update-cache").
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	328
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	329	However, if you decide to jump to a new version, or check out somebody
				330	else's version, or just restore a previous tree, you'd populate your
				331	index file with read-tree, and then you need to check out the result
				332	with
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	333
				334	checkout-cache filename
				335
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	336	or, if you want to check out all of the index, use "-a".
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	337
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	338	NOTE! checkout-cache normally refuses to overwrite old files, so if
				339	you have an old version of the tree already checked out, you will need
				340	to use the "-f" flag (_before_ the "-a" flag or the filename) to
				341	_force_ the checkout.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	342
				343
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	344	Finally, there are a few odds and ends which are not purely moving
				345	from one representation to the other:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	346
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	347	5) Tying it all together
				348	~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	349
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	350	To commit a tree you have instantiated with "write-tree", you'd create
				351	a "commit" object that refers to that tree and the history behind it -
				352	most notably the "parent" commits that preceded it in history.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	353
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	354	Normally a "commit" has one parent: the previous state of the tree
				355	before a certain change was made. However, sometimes it can have two
				356	or more parent commits, in which case we call it a "merge", due to the
				357	fact that such a commit brings together ("merges") two or more
				358	previous states represented by other commits.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	359
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	360	In other words, while a "tree" represents a particular directory state
				361	of a working directory, a "commit" represents that state in "time",
				362	and explains how we got there.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	363
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	364	You create a commit object by giving it the tree that describes the
				365	state at the time of the commit, and a list of parents:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	366
				367	commit-tree <tree> -p <parent> [-p <parent2> ..]
				368
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	369	and then giving the reason for the commit on stdin (either through
				370	redirection from a pipe or file, or by just typing it at the tty).
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	371
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	372	commit-tree will return the name of the object that represents that
				373	commit, and you should save it away for later use. Normally, you'd
				374	commit a new "HEAD" state, and while git doesn't care where you save
				375	the note about that state, in practice we tend to just write the
				376	result to the file ".git/HEAD", so that we can always see what the
				377	last committed state was.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	378
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	379	6) Examining the data
				380	~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	381
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	382	You can examine the data represented in the object database and the
				383	index with various helper tools. For every object, you can use
				384	"cat-file" to examine details about the object:
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	385
				386	cat-file -t <objectname>
				387
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	388	shows the type of the object, and once you have the type (which is
				389	usually implicit in where you find the object), you can use
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	390
				391	cat-file blob\|tree\|commit <objectname>
				392
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	393	to show its contents. NOTE! Trees have binary content, and as a result
				394	there is a special helper for showing that content, called "ls-tree",
				395	which turns the binary content into a more easily readable form.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	396
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	397	It's especially instructive to look at "commit" objects, since those
				398	tend to be small and fairly self-explanatory. In particular, if you
				399	follow the convention of having the top commit name in ".git/HEAD",
				400	you can do
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	401
				402	cat-file commit $(cat .git/HEAD)
				403
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	404	to see what the top commit was.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	405
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	406	7) Merging multiple trees
				407	~~~~~~~~~~~~~~~~~~~~~~~~~
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	408
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	409	Git helps you do a three-way merge, which you can expand to n-way by
				410	repeating the merge procedure arbitrary times until you finally
				411	"commit" the state. The normal situation is that you'd only do one
				412	three-way merge (two parents), and commit it, but if you like to, you
				413	can do multiple parents in one go.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	414
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	415	To do a three-way merge, you need the two sets of "commit" objects
				416	that you want to merge, use those to find the closest common parent (a
				417	third "commit" object), and then use those commit objects to find the
				418	state of the directory ("tree" object) at these points.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	419
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	420	To get the "base" for the merge, you first look up the common parent
				421	of two commits with
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	422
				423	merge-base <commit1> <commit2>
				424
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	425	which will return you the commit they are both based on. You should
				426	now look up the "tree" objects of those commits, which you can easily
				427	do with (for example)
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	428
				429	cat-file commit <commitname> \| head -1
				430
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	431	since the tree object information is always the first line in a commit
				432	object.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	433
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	434	Once you know the three trees you are going to merge (the one
				435	"original" tree, aka the common case, and the two "result" trees, aka
				436	the branches you want to merge), you do a "merge" read into the
				437	index. This will throw away your old index contents, so you should
				438	make sure that you've committed those - in fact you would normally
				439	always do a merge against your last commit (which should thus match
				440	what you have in your current index anyway).
				441
				442	To do the merge, do
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	443
				444	read-tree -m <origtree> <target1tree> <target2tree>
				445
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	446	which will do all trivial merge operations for you directly in the
				447	index file, and you can just write the result out with "write-tree".
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	448
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	449	NOTE! Because the merge is done in the index file, and not in your
				450	working directory, your working directory will no longer match your
				451	index. You can use "checkout-cache -f -a" to make the effect of the
				452	merge be seen in your working directory.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	453
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	454	NOTE2! Sadly, many merges aren't trivial. If there are files that have
				455	been added.moved or removed, or if both branches have modified the
				456	same file, you will be left with an index tree that contains "merge
				457	entries" in it. Such an index tree can _NOT_ be written out to a tree
				458	object, and you will have to resolve any such merge clashes using
				459	other tools before you can write out the result.
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	460
Linus Torvalds	6ad6d3d	2005-04-17 21:52:23 -0700	[diff] [blame]	461
David Greaves	8ac866a	2005-05-22 18:44:16 +0100	[diff] [blame]	462	[ fixme: talk about resolving merges here ]