| A short git tutorial |
| ==================== |
| May 2005 |
| |
| |
| Introduction |
| ------------ |
| |
| This is trying to be a short tutorial on setting up and using a git |
| archive, mainly because being hands-on and using explicit examples is |
| often the best way of explaining what is going on. |
| |
| In normal life, most people wouldn't use the "core" git programs |
| directly, but rather script around them to make them more palatable. |
| Understanding the core git stuff may help some people get those scripts |
| done, though, and it may also be instructive in helping people |
| understand what it is that the higher-level helper scripts are actually |
| doing. |
| |
| The core git is often called "plumbing", with the prettier user |
| interfaces on top of it called "porcelain". You may not want to use the |
| plumbing directly very often, but it can be good to know what the |
| plumbing does for when the porcelain isn't flushing... |
| |
| |
| Creating a git archive |
| ---------------------- |
| |
| Creating a new git archive couldn't be easier: all git archives start |
| out empty, and the only thing you need to do is find yourself a |
| subdirectory that you want to use as a working tree - either an empty |
| one for a totally new project, or an existing working tree that you want |
| to import into git. |
| |
| For our first example, we're going to start a totally new archive from |
| scratch, with no pre-existing files, and we'll call it "git-tutorial". |
| To start up, create a subdirectory for it, change into that |
| subdirectory, and initialize the git infrastructure with "git-init-db": |
| |
| mkdir git-tutorial |
| cd git-tutorial |
| git-init-db |
| |
| to which git will reply |
| |
| defaulting to local storage area |
| |
| which is just git's way of saying that you haven't been doing anything |
| strange, and that it will have created a local .git directory setup for |
| your new project. You will now have a ".git" directory, and you can |
| inspect that with "ls". For your new empty project, ls should show you |
| three entries: |
| |
| - a symlink called HEAD, pointing to "refs/heads/master" |
| |
| Don't worry about the fact that the file that the HEAD link points to |
| doesn't even exist yet - you haven't created the commit that will |
| start your HEAD development branch yet. |
| |
| - a subdirectory called "objects", which will contain all the git SHA1 |
| objects of your project. You should never have any real reason to |
| look at the objects directly, but you might want to know that these |
| objects are what contains all the real _data_ in your repository. |
| |
| - a subdirectory called "refs", which contains references to objects. |
| |
| In particular, the "refs" subdirectory will contain two other |
| subdirectories, named "heads" and "tags" respectively. They do |
| exactly what their names imply: they contain references to any number |
| of different "heads" of development (aka "branches"), and to any |
| "tags" that you have created to name specific versions of your |
| repository. |
| |
| One note: the special "master" head is the default branch, which is |
| why the .git/HEAD file was created as a symlink to it even if it |
| doesn't yet exist. Basically, the HEAD link is supposed to always |
| point to the branch you are working on right now, and you always |
| start out expecting to work on the "master" branch. |
| |
| However, this is only a convention, and you can name your branches |
| anything you want, and don't have to ever even _have_ a "master" |
| branch. A number of the git tools will assume that .git/HEAD is |
| valid, though. |
| |
| [ Implementation note: an "object" is identified by its 160-bit SHA1 |
| hash, aka "name", and a reference to an object is always the 40-byte |
| hex representation of that SHA1 name. The files in the "refs" |
| subdirectory are expected to contain these hex references (usually |
| with a final '\n' at the end), and you should thus expect to see a |
| number of 41-byte files containing these references in this refs |
| subdirectories when you actually start populating your tree ] |
| |
| You have now created your first git archive. Of course, since it's |
| empty, that's not very useful, so let's start populating it with data. |
| |
| |
| Populating a git archive |
| ------------------------ |
| |
| We'll keep this simple and stupid, so we'll start off with populating a |
| few trivial files just to get a feel for it. |
| |
| Start off with just creating any random files that you want to maintain |
| in your git archive. We'll start off with a few bad examples, just to |
| get a feel for how this works: |
| |
| echo "Hello World" >hello |
| echo "Silly example" >example |
| |
| you have now created two files in your working directory, but to |
| actually check in your hard work, you will have to go through two steps: |
| |
| - fill in the "cache" aka "index" file with the information about your |
| working directory state |
| |
| - commit that index file as an object. |
| |
| The first step is trivial: when you want to tell git about any changes |
| to your working directory, you use the "git-update-cache" program. That |
| program normally just takes a list of filenames you want to update, but |
| to avoid trivial mistakes, it refuses to add new entries to the cache |
| (or remove existing ones) unless you explicitly tell it that you're |
| adding a new entry with the "--add" flag (or removing an entry with the |
| "--remove") flag. |
| |
| So to populate the index with the two files you just created, you can do |
| |
| git-update-cache --add hello example |
| |
| and you have now told git to track those two files. |
| |
| In fact, as you did that, if you now look into your object directory, |
| you'll notice that git will have added two new objects to the object |
| store. If you did exactly the steps above, you should now be able to do |
| |
| ls .git/objects/??/* |
| |
| and see two files: |
| |
| .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 |
| .git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962 |
| |
| which correspond with the object with SHA1 names of 557db... and f24c7.. |
| respectively. |
| |
| If you want to, you can use "git-cat-file" to look at those objects, but |
| you'll have to use the object name, not the filename of the object: |
| |
| git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238 |
| |
| where the "-t" tells git-cat-file to tell you what the "type" of the |
| object is. Git will tell you that you have a "blob" object (ie just a |
| regular file), and you can see the contents with |
| |
| git-cat-file "blob" 557db03de997c86a4a028e1ebd3a1ceb225be238 |
| |
| which will print out "Hello World". The object 557db... is nothing |
| more than the contents of your file "hello". |
| |
| [ Digression: don't confuse that object with the file "hello" itself. The |
| object is literally just those specific _contents_ of the file, and |
| however much you later change the contents in file "hello", the object we |
| just looked at will never change. Objects are immutable. ] |
| |
| Anyway, as we mentioned previously, you normally never actually take a |
| look at the objects themselves, and typing long 40-character hex SHA1 |
| names is not something you'd normally want to do. The above digression |
| was just to show that "git-update-cache" did something magical, and |
| actually saved away the contents of your files into the git content |
| store. |
| |
| Updating the cache did something else too: it created a ".git/index" |
| file. This is the index that describes your current working tree, and |
| something you should be very aware of. Again, you normally never worry |
| about the index file itself, but you should be aware of the fact that |
| you have not actually really "checked in" your files into git so far, |
| you've only _told_ git about them. |
| |
| However, since git knows about them, you can now start using some of the |
| most basic git commands to manipulate the files or look at their status. |
| |
| In particular, let's not even check in the two files into git yet, we'll |
| start off by adding another line to "hello" first: |
| |
| echo "It's a new day for git" >>hello |
| |
| and you can now, since you told git about the previous state of "hello", ask |
| git what has changed in the tree compared to your old index, using the |
| "git-diff-files" command: |
| |
| git-diff-files |
| |
| oops. That wasn't very readable. It just spit out its own internal |
| version of a "diff", but that internal version really just tells you |
| that it has noticed that "hello" has been modified, and that the old object |
| contents it had have been replaced with something else. |
| |
| To make it readable, we can tell git-diff-files to output the |
| differences as a patch, using the "-p" flag: |
| |
| git-diff-files -p |
| |
| which will spit out |
| |
| diff --git a/hello b/hello |
| --- a/hello |
| +++ b/hello |
| @@ -1 +1,2 @@ |
| Hello World |
| +It's a new day for git |
| |
| ie the diff of the change we caused by adding another line to "hello". |
| |
| In other words, git-diff-files always shows us the difference between |
| what is recorded in the index, and what is currently in the working |
| tree. That's very useful. |
| |
| A common shorthand for "git-diff-files -p" is to just write |
| |
| git diff |
| |
| which will do the same thing. |
| |
| |
| Committing git state |
| -------------------- |
| |
| Now, we want to go to the next stage in git, which is to take the files |
| that git knows about in the index, and commit them as a real tree. We do |
| that in two phases: creating a "tree" object, and committing that "tree" |
| object as a "commit" object together with an explanation of what the |
| tree was all about, along with information of how we came to that state. |
| |
| Creating a tree object is trivial, and is done with "git-write-tree". |
| There are no options or other input: git-write-tree will take the |
| current index state, and write an object that describes that whole |
| index. In other words, we're now tying together all the different |
| filenames with their contents (and their permissions), and we're |
| creating the equivalent of a git "directory" object: |
| |
| git-write-tree |
| |
| and this will just output the name of the resulting tree, in this case |
| (if you have done exactly as I've described) it should be |
| |
| 8988da15d077d4829fc51d8544c097def6644dbb |
| |
| which is another incomprehensible object name. Again, if you want to, |
| you can use "git-cat-file -t 8988d.." to see that this time the object |
| is not a "blob" object, but a "tree" object (you can also use |
| git-cat-file to actually output the raw object contents, but you'll see |
| mainly a binary mess, so that's less interesting). |
| |
| However - normally you'd never use "git-write-tree" on its own, because |
| normally you always commit a tree into a commit object using the |
| "git-commit-tree" command. In fact, it's easier to not actually use |
| git-write-tree on its own at all, but to just pass its result in as an |
| argument to "git-commit-tree". |
| |
| "git-commit-tree" normally takes several arguments - it wants to know |
| what the _parent_ of a commit was, but since this is the first commit |
| ever in this new archive, and it has no parents, we only need to pass in |
| the tree ID. However, git-commit-tree also wants to get a commit message |
| on its standard input, and it will write out the resulting ID for the |
| commit to its standard output. |
| |
| And this is where we start using the .git/HEAD file. The HEAD file is |
| supposed to contain the reference to the top-of-tree, and since that's |
| exactly what git-commit-tree spits out, we can do this all with a simple |
| shell pipeline: |
| |
| echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD |
| |
| which will say: |
| |
| Committing initial tree 8988da15d077d4829fc51d8544c097def6644dbb |
| |
| just to warn you about the fact that it created a totally new commit |
| that is not related to anything else. Normally you do this only _once_ |
| for a project ever, and all later commits will be parented on top of an |
| earlier commit, and you'll never see this "Committing initial tree" |
| message ever again. |
| |
| Again, normally you'd never actually do this by hand. There is a |
| helpful script called "git commit" that will do all of this for you. So |
| you could have just written |
| |
| git commit |
| |
| instead, and it would have done the above magic scripting for you. |
| |
| |
| Making a change |
| --------------- |
| |
| Remember how we did the "git-update-cache" on file "hello" and then we |
| changed "hello" afterward, and could compare the new state of "hello" with the |
| state we saved in the index file? |
| |
| Further, remember how I said that "git-write-tree" writes the contents |
| of the _index_ file to the tree, and thus what we just committed was in |
| fact the _original_ contents of the file "hello", not the new ones. We did |
| that on purpose, to show the difference between the index state, and the |
| state in the working directory, and how they don't have to match, even |
| when we commit things. |
| |
| As before, if we do "git-diff-files -p" in our git-tutorial project, |
| we'll still see the same difference we saw last time: the index file |
| hasn't changed by the act of committing anything. However, now that we |
| have committed something, we can also learn to use a new command: |
| "git-diff-cache". |
| |
| Unlike "git-diff-files", which showed the difference between the index |
| file and the working directory, "git-diff-cache" shows the differences |
| between a committed _tree_ and either the index file or the working |
| directory. In other words, git-diff-cache wants a tree to be diffed |
| against, and before we did the commit, we couldn't do that, because we |
| didn't have anything to diff against. |
| |
| But now we can do |
| |
| git-diff-cache -p HEAD |
| |
| (where "-p" has the same meaning as it did in git-diff-files), and it |
| will show us the same difference, but for a totally different reason. |
| Now we're comparing the working directory not against the index file, |
| but against the tree we just wrote. It just so happens that those two |
| are obviously the same, so we get the same result. |
| |
| Again, because this is a common operation, you can also just shorthand |
| it with |
| |
| git diff HEAD |
| |
| which ends up doing the above for you. |
| |
| In other words, "git-diff-cache" normally compares a tree against the |
| working directory, but when given the "--cached" flag, it is told to |
| instead compare against just the index cache contents, and ignore the |
| current working directory state entirely. Since we just wrote the index |
| file to HEAD, doing "git-diff-cache --cached -p HEAD" should thus return |
| an empty set of differences, and that's exactly what it does. |
| |
| [ Digression: "git-diff-cache" really always uses the index for its |
| comparisons, and saying that it compares a tree against the working |
| directory is thus not strictly accurate. In particular, the list of |
| files to compare (the "meta-data") _always_ comes from the index file, |
| regardless of whether the --cached flag is used or not. The --cached |
| flag really only determines whether the file _contents_ to be compared |
| come from the working directory or not. |
| |
| This is not hard to understand, as soon as you realize that git simply |
| never knows (or cares) about files that it is not told about |
| explicitly. Git will never go _looking_ for files to compare, it |
| expects you to tell it what the files are, and that's what the index |
| is there for. ] |
| |
| However, our next step is to commit the _change_ we did, and again, to |
| understand what's going on, keep in mind the difference between "working |
| directory contents", "index file" and "committed tree". We have changes |
| in the working directory that we want to commit, and we always have to |
| work through the index file, so the first thing we need to do is to |
| update the index cache: |
| |
| git-update-cache hello |
| |
| (note how we didn't need the "--add" flag this time, since git knew |
| about the file already). |
| |
| Note what happens to the different git-diff-xxx versions here. After |
| we've updated "hello" in the index, "git-diff-files -p" now shows no |
| differences, but "git-diff-cache -p HEAD" still _does_ show that the |
| current state is different from the state we committed. In fact, now |
| "git-diff-cache" shows the same difference whether we use the "--cached" |
| flag or not, since now the index is coherent with the working directory. |
| |
| Now, since we've updated "hello" in the index, we can commit the new |
| version. We could do it by writing the tree by hand again, and |
| committing the tree (this time we'd have to use the "-p HEAD" flag to |
| tell commit that the HEAD was the _parent_ of the new commit, and that |
| this wasn't an initial commit any more), but you've done that once |
| already, so let's just use the helpful script this time: |
| |
| git commit |
| |
| which starts an editor for you to write the commit message and tells you |
| a bit about what you're doing. |
| |
| Write whatever message you want, and all the lines that start with '#' |
| will be pruned out, and the rest will be used as the commit message for |
| the change. If you decide you don't want to commit anything after all at |
| this point (you can continue to edit things and update the cache), you |
| can just leave an empty message. Otherwise git-commit-script will commit |
| the change for you. |
| |
| You've now made your first real git commit. And if you're interested in |
| looking at what git-commit-script really does, feel free to investigate: |
| it's a few very simple shell scripts to generate the helpful (?) commit |
| message headers, and a few one-liners that actually do the commit itself. |
| |
| |
| Checking it out |
| --------------- |
| |
| While creating changes is useful, it's even more useful if you can tell |
| later what changed. The most useful command for this is another of the |
| "diff" family, namely "git-diff-tree". |
| |
| git-diff-tree can be given two arbitrary trees, and it will tell you the |
| differences between them. Perhaps even more commonly, though, you can |
| give it just a single commit object, and it will figure out the parent |
| of that commit itself, and show the difference directly. Thus, to get |
| the same diff that we've already seen several times, we can now do |
| |
| git-diff-tree -p HEAD |
| |
| (again, "-p" means to show the difference as a human-readable patch), |
| and it will show what the last commit (in HEAD) actually changed. |
| |
| More interestingly, you can also give git-diff-tree the "-v" flag, which |
| tells it to also show the commit message and author and date of the |
| commit, and you can tell it to show a whole series of diffs. |
| Alternatively, you can tell it to be "silent", and not show the diffs at |
| all, but just show the actual commit message. |
| |
| In fact, together with the "git-rev-list" program (which generates a |
| list of revisions), git-diff-tree ends up being a veritable fount of |
| changes. A trivial (but very useful) script called "git-whatchanged" is |
| included with git which does exactly this, and shows a log of recent |
| activity. |
| |
| To see the whole history of our pitiful little git-tutorial project, you |
| can do |
| |
| git log |
| |
| which shows just the log messages, or if we want to see the log together |
| with the associated patches use the more complex (and much more |
| powerful) |
| |
| git-whatchanged -p --root |
| |
| and you will see exactly what has changed in the repository over its |
| short history. |
| |
| [ Side note: the "--root" flag is a flag to git-diff-tree to tell it to |
| show the initial aka "root" commit too. Normally you'd probably not |
| want to see the initial import diff, but since the tutorial project |
| was started from scratch and is so small, we use it to make the result |
| a bit more interesting ] |
| |
| With that, you should now be having some inkling of what git does, and |
| can explore on your own. |
| |
| |
| [ Side note: most likely, you are not directly using the core |
| git Plumbing commands, but using Porcelain like Cogito on top |
| of it. Cogito works a bit differently and you usually do not |
| have to run "git-update-cache" yourself for changed files (you |
| do tell underlying git about additions and removals via |
| "cg-add" and "cg-rm" commands). Just before you make a commit |
| with "cg-commit", Cogito figures out which files you modified, |
| and runs "git-update-cache" on them for you. ] |
| |
| |
| Tagging a version |
| ----------------- |
| |
| In git, there's two kinds of tags, a "light" one, and a "signed tag". |
| |
| A "light" tag is technically nothing more than a branch, except we put |
| it in the ".git/refs/tags/" subdirectory instead of calling it a "head". |
| So the simplest form of tag involves nothing more than |
| |
| git tag my-first-tag |
| |
| which just writes the current HEAD into the .git/refs/tags/my-first-tag |
| file, after which point you can then use this symbolic name for that |
| particular state. You can, for example, do |
| |
| git diff my-first-tag |
| |
| to diff your current state against that tag (which at this point will |
| obviously be an empty diff, but if you continue to develop and commit |
| stuff, you can use your tag as an "anchor-point" to see what has changed |
| since you tagged it. |
| |
| A "signed tag" is actually a real git object, and contains not only a |
| pointer to the state you want to tag, but also a small tag name and |
| message, along with a PGP signature that says that yes, you really did |
| that tag. You create these signed tags with the "-s" flag to "git tag": |
| |
| git tag -s <tagname> |
| |
| which will sign the current HEAD (but you can also give it another |
| argument that specifies the thing to tag, ie you could have tagged the |
| current "mybranch" point by using "git tag <tagname> mybranch"). |
| |
| You normally only do signed tags for major releases or things |
| like that, while the light-weight tags are useful for any marking you |
| want to do - any time you decide that you want to remember a certain |
| point, just create a private tag for it, and you have a nice symbolic |
| name for the state at that point. |
| |
| |
| Copying archives |
| ----------------- |
| |
| Git archives are normally totally self-sufficient, and it's worth noting |
| that unlike CVS, for example, there is no separate notion of |
| "repository" and "working tree". A git repository normally _is_ the |
| working tree, with the local git information hidden in the ".git" |
| subdirectory. There is nothing else. What you see is what you got. |
| |
| [ Side note: you can tell git to split the git internal information from |
| the directory that it tracks, but we'll ignore that for now: it's not |
| how normal projects work, and it's really only meant for special uses. |
| So the mental model of "the git information is always tied directly to |
| the working directory that it describes" may not be technically 100% |
| accurate, but it's a good model for all normal use ] |
| |
| This has two implications: |
| |
| - if you grow bored with the tutorial archive you created (or you've |
| made a mistake and want to start all over), you can just do simple |
| |
| rm -rf git-tutorial |
| |
| and it will be gone. There's no external repository, and there's no |
| history outside of the project you created. |
| |
| - if you want to move or duplicate a git archive, you can do so. There |
| is "git clone" command, but if all you want to do is just to |
| create a copy of your archive (with all the full history that |
| went along with it), you can do so with a regular |
| "cp -a git-tutorial new-git-tutorial". |
| |
| Note that when you've moved or copied a git archive, your git index |
| file (which caches various information, notably some of the "stat" |
| information for the files involved) will likely need to be refreshed. |
| So after you do a "cp -a" to create a new copy, you'll want to do |
| |
| git-update-cache --refresh |
| |
| to make sure that the index file is up-to-date in the new one. |
| |
| Note that the second point is true even across machines. You can |
| duplicate a remote git archive with _any_ regular copy mechanism, be it |
| "scp", "rsync" or "wget". |
| |
| When copying a remote repository, you'll want to at a minimum update the |
| index cache when you do this, and especially with other peoples |
| repositories you often want to make sure that the index cache is in some |
| known state (you don't know _what_ they've done and not yet checked in), |
| so usually you'll precede the "git-update-cache" with a |
| |
| git-read-tree --reset HEAD |
| git-update-cache --refresh |
| |
| which will force a total index re-build from the tree pointed to by HEAD |
| (it resets the index contents to HEAD, and then the git-update-cache |
| makes sure to match up all index entries with the checked-out files). |
| |
| The above can also be written as simply |
| |
| git reset |
| |
| and in fact a lot of the common git command combinations can be scripted |
| with the "git xyz" interfaces, and you can learn things by just looking |
| at what the git-*-script scripts do ("git reset" is the above two lines |
| implemented in "git-reset-script", but some things like "git status" and |
| "git commit" are slightly more complex scripts around the basic git |
| commands). |
| |
| NOTE! Many (most?) public remote repositories will not contain any of |
| the checked out files or even an index file, and will _only_ contain the |
| actual core git files. Such a repository usually doesn't even have the |
| ".git" subdirectory, but has all the git files directly in the |
| repository. |
| |
| To create your own local live copy of such a "raw" git repository, you'd |
| first create your own subdirectory for the project, and then copy the |
| raw repository contents into the ".git" directory. For example, to |
| create your own copy of the git repository, you'd do the following |
| |
| mkdir my-git |
| cd my-git |
| rsync -rL rsync://rsync.kernel.org/pub/scm/git/git.git/ .git |
| |
| followed by |
| |
| git-read-tree HEAD |
| |
| to populate the index. However, now you have populated the index, and |
| you have all the git internal files, but you will notice that you don't |
| actually have any of the _working_directory_ files to work on. To get |
| those, you'd check them out with |
| |
| git-checkout-cache -u -a |
| |
| where the "-u" flag means that you want the checkout to keep the index |
| up-to-date (so that you don't have to refresh it afterward), and the |
| "-a" flag means "check out all files" (if you have a stale copy or an |
| older version of a checked out tree you may also need to add the "-f" |
| flag first, to tell git-checkout-cache to _force_ overwriting of any old |
| files). |
| |
| Again, this can all be simplified with |
| |
| git clone rsync://rsync.kernel.org/pub/scm/git/git.git/ my-git |
| cd my-git |
| git checkout |
| |
| which will end up doing all of the above for you. |
| |
| You have now successfully copied somebody else's (mine) remote |
| repository, and checked it out. |
| |
| |
| Creating a new branch |
| --------------------- |
| |
| Branches in git are really nothing more than pointers into the git |
| object space from within the ".git/refs/" subdirectory, and as we |
| already discussed, the HEAD branch is nothing but a symlink to one of |
| these object pointers. |
| |
| You can at any time create a new branch by just picking an arbitrary |
| point in the project history, and just writing the SHA1 name of that |
| object into a file under .git/refs/heads/. You can use any filename you |
| want (and indeed, subdirectories), but the convention is that the |
| "normal" branch is called "master". That's just a convention, though, |
| and nothing enforces it. |
| |
| To show that as an example, let's go back to the git-tutorial archive we |
| used earlier, and create a branch in it. You do that by simply just |
| saying that you want to check out a new branch: |
| |
| git checkout -b mybranch |
| |
| will create a new branch based at the current HEAD position, and switch |
| to it. |
| |
| [ Side note: if you make the decision to start your new branch at some |
| other point in the history than the current HEAD, you can do so by |
| just telling "git checkout" what the base of the checkout would be. |
| In other words, if you have an earlier tag or branch, you'd just do |
| |
| git checkout -b mybranch earlier-branch |
| |
| and it would create the new branch "mybranch" at the earlier point, |
| and check out the state at that time. ] |
| |
| You can always just jump back to your original "master" branch by doing |
| |
| git checkout master |
| |
| (or any other branch-name, for that matter) and if you forget which |
| branch you happen to be on, a simple |
| |
| ls -l .git/HEAD |
| |
| will tell you where it's pointing. |
| |
| NOTE! Sometimes you may wish to create a new branch _without_ actually |
| checking it out and switching to it. If so, just use the command |
| |
| git branch <branchname> [startingpoint] |
| |
| which will simply _create_ the branch, but will not do anything further. |
| You can then later - once you decide that you want to actually develop |
| on that branch - switch to that branch with a regular "git checkout" |
| with the branchname as the argument. |
| |
| |
| Merging two branches |
| -------------------- |
| |
| One of the ideas of having a branch is that you do some (possibly |
| experimental) work in it, and eventually merge it back to the main |
| branch. So assuming you created the above "mybranch" that started out |
| being the same as the original "master" branch, let's make sure we're in |
| that branch, and do some work there. |
| |
| git checkout mybranch |
| echo "Work, work, work" >>hello |
| git commit hello |
| |
| Here, we just added another line to "hello", and we used a shorthand for |
| both going a "git-update-cache hello" and "git commit" by just giving the |
| filename directly to "git commit". |
| |
| Now, to make it a bit more interesting, let's assume that somebody else |
| does some work in the original branch, and simulate that by going back |
| to the master branch, and editing the same file differently there: |
| |
| git checkout master |
| |
| Here, take a moment to look at the contents of "hello", and notice how they |
| don't contain the work we just did in "mybranch" - because that work |
| hasn't happened in the "master" branch at all. Then do |
| |
| echo "Play, play, play" >>hello |
| echo "Lots of fun" >>example |
| git commit hello example |
| |
| since the master branch is obviously in a much better mood. |
| |
| Now, you've got two branches, and you decide that you want to merge the |
| work done. Before we do that, let's introduce a cool graphical tool that |
| helps you view what's going on: |
| |
| gitk --all |
| |
| will show you graphically both of your branches (that's what the "--all" |
| means: normally it will just show you your current HEAD) and their |
| histories. You can also see exactly how they came to be from a common |
| source. |
| |
| Anyway, let's exit gitk (^Q or the File menu), and decide that we want |
| to merge the work we did on the "mybranch" branch into the "master" |
| branch (which is currently our HEAD too). To do that, there's a nice |
| script called "git resolve", which wants to know which branches you want |
| to resolve and what the merge is all about: |
| |
| git resolve HEAD mybranch "Merge work in mybranch" |
| |
| where the third argument is going to be used as the commit message if |
| the merge can be resolved automatically. |
| |
| Now, in this case we've intentionally created a situation where the |
| merge will need to be fixed up by hand, though, so git will do as much |
| of it as it can automatically (which in this case is just merge the "b" |
| file, which had no differences in the "mybranch" branch), and say: |
| |
| Simple merge failed, trying Automatic merge |
| Auto-merging hello. |
| merge: warning: conflicts during merge |
| ERROR: Merge conflict in hello. |
| fatal: merge program failed |
| Automatic merge failed, fix up by hand |
| |
| which is way too verbose, but it basically tells you that it failed the |
| really trivial merge ("Simple merge") and did an "Automatic merge" |
| instead, but that too failed due to conflicts in "hello". |
| |
| Not to worry. It left the (trivial) conflict in "hello" in the same form you |
| should already be well used to if you've ever used CVS, so let's just |
| open "hello" in our editor (whatever that may be), and fix it up somehow. |
| I'd suggest just making it so that "hello" contains all four lines: |
| |
| Hello World |
| It's a new day for git |
| Play, play, play |
| Work, work, work |
| |
| and once you're happy with your manual merge, just do a |
| |
| git commit hello |
| |
| which will very loudly warn you that you're now committing a merge |
| (which is correct, so never mind), and you can write a small merge |
| message about your adventures in git-merge-land. |
| |
| After you're done, start up "gitk --all" to see graphically what the |
| history looks like. Notice that "mybranch" still exists, and you can |
| switch to it, and continue to work with it if you want to. The |
| "mybranch" branch will not contain the merge, but next time you merge it |
| from the "master" branch, git will know how you merged it, so you'll not |
| have to do _that_ merge again. |
| |
| |
| Merging external work |
| --------------------- |
| |
| It's usually much more common that you merge with somebody else than |
| merging with your own branches, so it's worth pointing out that git |
| makes that very easy too, and in fact, it's not that different from |
| doing a "git resolve". In fact, a remote merge ends up being nothing |
| more than "fetch the work from a remote repository into a temporary tag" |
| followed by a "git resolve". |
| |
| It's such a common thing to do that it's called "git pull", and you can |
| simply do |
| |
| git pull <remote-repository> |
| |
| and optionally give a branch-name for the remote end as a second |
| argument. |
| |
| The "remote" repository can even be on the same machine. One of |
| the following notations can be used to name the repository to |
| pull from: |
| |
| Rsync URL |
| rsync://remote.machine/path/to/repo.git/ |
| |
| HTTP(s) URL |
| http://remote.machine/path/to/repo.git/ |
| |
| GIT URL |
| git://remote.machine/path/to/repo.git/ |
| |
| SSH URL |
| remote.machine:/path/to/repo.git/ |
| |
| Local directory |
| /path/to/repo.git/ |
| |
| [ Digression: you could do without using any branches at all, by |
| keeping as many local repositories as you would like to have |
| branches, and merging between them with "git pull", just like |
| you merge between branches. The advantage of this approach is |
| that it lets you keep set of files for each "branch" checked |
| out and you may find it easier to switch back and forth if you |
| juggle multiple lines of development simultaneously. Of |
| course, you will pay the price of more disk usage to hold |
| multiple working trees, but disk space is cheap these days. ] |
| |
| It is likely that you will be pulling from the same remote |
| repository from time to time. As a short hand, you can store |
| the remote repository URL in a file under .git/branches/ |
| directory, like this: |
| |
| mkdir -p .git/branches |
| echo rsync://kernel.org/pub/scm/git/git.git/ \ |
| >.git/branches/linus |
| |
| and use the filename to "git pull" instead of the full URL. |
| The contents of a file under .git/branches can even be a prefix |
| of a full URL, like this: |
| |
| echo rsync://kernel.org/pub/.../jgarzik/ |
| >.git/branches/jgarzik |
| |
| Examples. |
| |
| (1) git pull linus |
| (2) git pull linus tag v0.99.1 |
| (3) git pull jgarzik/netdev-2.6.git/ e100 |
| |
| the above are equivalent to: |
| |
| (1) git pull rsync://kernel.org/pub/scm/git/git.git/ HEAD |
| (2) git pull rsync://kernel.org/pub/scm/git/git.git/ tag v0.99.1 |
| (3) git pull rsync://kernel.org/pub/.../jgarzik/netdev-2.6.git e100 |
| |
| |
| Publishing your work |
| -------------------- |
| |
| So we can use somebody else's work from a remote repository; but |
| how can _you_ prepare a repository to let other people pull from |
| it? |
| |
| Your do your real work in your working directory that has your |
| primary repository hanging under it as its ".git" subdirectory. |
| You _could_ make that repository accessible remotely and ask |
| people to pull from it, but in practice that is not the way |
| things are usually done. A recommended way is to have a public |
| repository, make it reachable by other people, and when the |
| changes you made in your primary working directory are in good |
| shape, update the public repository from it. This is often |
| called "pushing". |
| |
| [ Side note: this public repository could further be mirrored, |
| and that is how kernel.org git repositories are done. ] |
| |
| Publishing the changes from your local (private) repository to |
| your remote (public) repository requires a write privilege on |
| the remote machine. You need to have an SSH account there to |
| run a single command, "git-receive-pack". |
| |
| First, you need to create an empty repository on the remote |
| machine that will house your public repository. This empty |
| repository will be populated and be kept up-to-date by pushing |
| into it later. Obviously, this repository creation needs to be |
| done only once. |
| |
| [ Digression: "git push" uses a pair of programs, |
| "git-send-pack" on your local machine, and "git-receive-pack" |
| on the remote machine. The communication between the two over |
| the network internally uses an SSH connection. ] |
| |
| Your private repository's GIT directory is usually .git, but |
| your public repository is often named after the project name, |
| i.e. "<project>.git". Let's create such a public repository for |
| project "my-git". After logging into the remote machine, create |
| an empty directory: |
| |
| mkdir my-git.git |
| |
| Then, make that directory into a GIT repository by running |
| git-init-db, but this time, since it's name is not the usual |
| ".git", we do things slightly differently: |
| |
| GIT_DIR=my-git.git git-init-db |
| |
| Make sure this directory is available for others you want your |
| changes to be pulled by via the transport of your choice. Also |
| you need to make sure that you have the "git-receive-pack" |
| program on the $PATH. |
| |
| [ Side note: many installations of sshd do not invoke your shell |
| as the login shell when you directly run programs; what this |
| means is that if your login shell is bash, only .bashrc is |
| read and not .bash_profile. As a workaround, make sure |
| .bashrc sets up $PATH so that you can run 'git-receive-pack' |
| program. ] |
| |
| Your "public repository" is now ready to accept your changes. |
| Come back to the machine you have your private repository. From |
| there, run this command: |
| |
| git push <public-host>:/path/to/my-git.git master |
| |
| This synchronizes your public repository to match the named |
| branch head (i.e. "master" in this case) and objects reachable |
| from them in your current repository. |
| |
| As a real example, this is how I update my public git |
| repository. Kernel.org mirror network takes care of the |
| propagation to other publicly visible machines: |
| |
| git push master.kernel.org:/pub/scm/git/git.git/ |
| |
| |
| [ Digression: your GIT "public" repository people can pull from |
| is different from a public CVS repository that lets read-write |
| access to multiple developers. It is a copy of _your_ primary |
| repository published for others to use, and you should not |
| push into it from more than one repository (this means, not |
| just disallowing other developers to push into it, but also |
| you should push into it from a single repository of yours). |
| Sharing the result of work done by multiple people are always |
| done by pulling (i.e. fetching and merging) from public |
| repositories of those people. Typically this is done by the |
| "project lead" person, and the resulting repository is |
| published as the public repository of the "project lead" for |
| everybody to base further changes on. ] |
| |
| |
| Packing your repository |
| ----------------------- |
| |
| Earlier, we saw that one file under .git/objects/??/ directory |
| is stored for each git object you create. This representation |
| is convenient and efficient to create atomically and safely, but |
| not so to transport over the network. Since git objects are |
| immutable once they are created, there is a way to optimize the |
| storage by "packing them together". The command |
| |
| git repack |
| |
| will do it for you. If you followed the tutorial examples, you |
| would have accumulated about 17 objects in .git/objects/??/ |
| directories by now. "git repack" tells you how many objects it |
| packed, and stores the packed file in .git/objects/pack |
| directory. |
| |
| [ Side Note: you will see two files, pack-*.pack and pack-*.idx, |
| in .git/objects/pack directory. They are closely related to |
| each other, and if you ever copy them by hand to a different |
| repository for whatever reason, you should make sure you copy |
| them together. The former holds all the data from the objects |
| in the pack, and the latter holds the index for random |
| access. ] |
| |
| If you are paranoid, running "git-verify-pack" command would |
| detect if you have a corrupt pack, but do not worry too much. |
| Our programs are always perfect ;-). |
| |
| Once you have packed objects, you do not need to leave the |
| unpacked objects that are contained in the pack file anymore. |
| |
| git prune-packed |
| |
| would remove them for you. |
| |
| You can try running "find .git/objects -type f" before and after |
| you run "git prune-packed" if you are curious. |
| |
| [ Side Note: "git pull" is slightly cumbersome for HTTP transport, |
| as a packed repository may contain relatively few objects in a |
| relatively large pack. If you expect many HTTP pulls from your |
| public repository you might want to repack & prune often, or |
| never. ] |
| |
| If you run "git repack" again at this point, it will say |
| "Nothing to pack". Once you continue your development and |
| accumulate the changes, running "git repack" again will create a |
| new pack, that contains objects created since you packed your |
| archive the last time. We recommend that you pack your project |
| soon after the initial import (unless you are starting your |
| project from scratch), and then run "git repack" every once in a |
| while, depending on how active your project is. |
| |
| When a repository is synchronized via "git push" and "git pull", |
| objects packed in the source repository are usually stored |
| unpacked in the destination, unless rsync transport is used. |
| |
| |
| Working with Others |
| ------------------- |
| |
| Although git is a truly distributed system, it is often |
| convenient to organize your project with an informal hierarchy |
| of developers. Linux kernel development is run this way. There |
| is a nice illustration (page 17, "Merges to Mainline") in Randy |
| Dunlap's presentation (http://tinyurl.com/a2jdg). |
| |
| It should be stressed that this hierarchy is purely "informal". |
| There is nothing fundamental in git that enforces the "chain of |
| patch flow" this hierarchy implies. You do not have to pull |
| from only one remote repository. |
| |
| |
| A recommended workflow for a "project lead" goes like this: |
| |
| (1) Prepare your primary repository on your local machine. Your |
| work is done there. |
| |
| (2) Prepare a public repository accessible to others. |
| |
| (3) Push into the public repository from your primary |
| repository. |
| |
| (4) "git repack" the public repository. This establishes a big |
| pack that contains the initial set of objects as the |
| baseline, and possibly "git prune-packed" if the transport |
| used for pulling from your repository supports packed |
| repositories. |
| |
| (5) Keep working in your primary repository. Your changes |
| include modifications of your own, patches you receive via |
| e-mails, and merges resulting from pulling the "public" |
| repositories of your "subsystem maintainers". |
| |
| You can repack this private repository whenever you feel |
| like. |
| |
| (6) Push your changes to the public repository, and announce it |
| to the public. |
| |
| (7) Every once in a while, "git repack" the public repository. |
| Go back to step (5) and continue working. |
| |
| |
| A recommended work cycle for a "subsystem maintainer" who works |
| on that project and has an own "public repository" goes like this: |
| |
| (1) Prepare your work repository, by "git clone" the public |
| repository of the "project lead". The URL used for the |
| initial cloning is stored in .git/branches/origin. |
| |
| (2) Prepare a public repository accessible to others. |
| |
| (3) Copy over the packed files from "project lead" public |
| repository to your public repository by hand; preferrably |
| use rsync for that task. |
| |
| (4) Push into the public repository from your primary |
| repository. Run "git repack", and possibly "git |
| prune-packed" if the transport used for pulling from your |
| repository supports packed repositories. |
| |
| (5) Keep working in your primary repository. Your changes |
| include modifications of your own, patches you receive via |
| e-mails, and merges resulting from pulling the "public" |
| repositories of your "project lead" and possibly your |
| "sub-subsystem maintainers". |
| |
| You can repack this private repository whenever you feel |
| like. |
| |
| (6) Push your changes to your public repository, and ask your |
| "project lead" and possibly your "sub-subsystem |
| maintainers" to pull from it. |
| |
| (7) Every once in a while, "git repack" the public repository. |
| Go back to step (5) and continue working. |
| |
| |
| A recommended work cycle for an "individual developer" who does |
| not have a "public" repository is somewhat different. It goes |
| like this: |
| |
| (1) Prepare your work repository, by "git clone" the public |
| repository of the "project lead" (or a "subsystem |
| maintainer", if you work on a subsystem). The URL used for |
| the initial cloning is stored in .git/branches/origin. |
| |
| (2) Do your work there. Make commits. |
| |
| (3) Run "git fetch origin" from the public repository of your |
| upstream every once in a while. This does only the first |
| half of "git pull" but does not merge. The head of the |
| public repository is stored in .git/refs/heads/origin. |
| |
| (4) Use "git cherry origin" to see which ones of your patches |
| were accepted, and/or use "git rebase origin" to port your |
| unmerged changes forward to the updated upstream. |
| |
| (5) Use "git format-patch origin" to prepare patches for e-mail |
| submission to your upstream and send it out. Go back to |
| step (2) and continue. |
| |
| |
| [ to be continued.. cvsimports ] |