| = My First Object Walk |
| |
| == What's an Object Walk? |
| |
| The object walk is a key concept in Git - this is the process that underpins |
| operations like object transfer and fsck. Beginning from a given commit, the |
| list of objects is found by walking parent relationships between commits (commit |
| X based on commit W) and containment relationships between objects (tree Y is |
| contained within commit X, and blob Z is located within tree Y, giving our |
| working tree for commit X something like `y/z.txt`). |
| |
| A related concept is the revision walk, which is focused on commit objects and |
| their parent relationships and does not delve into other object types. The |
| revision walk is used for operations like `git log`. |
| |
| === Related Reading |
| |
| - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of |
| the revision walker in its various incarnations. |
| - `revision.h` |
| - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] |
| gives a good overview of the types of objects in Git and what your object |
| walk is really describing. |
| |
| == Setting Up |
| |
| Create a new branch from `master`. |
| |
| ---- |
| git checkout -b revwalk origin/master |
| ---- |
| |
| We'll put our fiddling into a new command. For fun, let's name it `git walken`. |
| Open up a new file `builtin/walken.c` and set up the command handler: |
| |
| ---- |
| /* |
| * "git walken" |
| * |
| * Part of the "My First Object Walk" tutorial. |
| */ |
| |
| #include "builtin.h" |
| |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| trace_printf(_("cmd_walken incoming...\n")); |
| return 0; |
| } |
| ---- |
| |
| NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or |
| off at runtime. For the purposes of this tutorial, we will write `walken` as |
| though it is intended for use as a "plumbing" command: that is, a command which |
| is used primarily in scripts, rather than interactively by humans (a "porcelain" |
| command). So we will send our debug output to `trace_printf()` instead. When |
| running, enable trace output by setting the environment variable `GIT_TRACE`. |
| |
| Add usage text and `-h` handling, like all subcommands should consistently do |
| (our test suite will notice and complain if you fail to do so). |
| |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| const char * const walken_usage[] = { |
| N_("git walken"), |
| NULL, |
| } |
| struct option options[] = { |
| OPT_END() |
| }; |
| |
| argc = parse_options(argc, argv, prefix, options, walken_usage, 0); |
| |
| ... |
| } |
| ---- |
| |
| Also add the relevant line in `builtin.h` near `cmd_whatchanged()`: |
| |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix); |
| ---- |
| |
| Include the command in `git.c` in `commands[]` near the entry for `whatchanged`, |
| maintaining alphabetical ordering: |
| |
| ---- |
| { "walken", cmd_walken, RUN_SETUP }, |
| ---- |
| |
| Add it to the `Makefile` near the line for `builtin/worktree.o`: |
| |
| ---- |
| BUILTIN_OBJS += builtin/walken.o |
| ---- |
| |
| Build and test out your command, without forgetting to ensure the `DEVELOPER` |
| flag is set, and with `GIT_TRACE` enabled so the debug output can be seen: |
| |
| ---- |
| $ echo DEVELOPER=1 >>config.mak |
| $ make |
| $ GIT_TRACE=1 ./bin-wrappers/git walken |
| ---- |
| |
| NOTE: For a more exhaustive overview of the new command process, take a look at |
| `Documentation/MyFirstContribution.txt`. |
| |
| NOTE: A reference implementation can be found at |
| https://github.com/nasamuffin/git/tree/revwalk. |
| |
| === `struct rev_cmdline_info` |
| |
| The definition of `struct rev_cmdline_info` can be found in `revision.h`. |
| |
| This struct is contained within the `rev_info` struct and is used to reflect |
| parameters provided by the user over the CLI. |
| |
| `nr` represents the number of `rev_cmdline_entry` present in the array. |
| |
| `alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is |
| used to track the allocated size of the list. |
| |
| Per entry, we find: |
| |
| `item` is the object provided upon which to base the object walk. Items in Git |
| can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.) |
| |
| `name` is the object ID (OID) of the object - a hex string you may be familiar |
| with from using Git to organize your source in the past. Check the tutorial |
| mentioned above towards the top for a discussion of where the OID can come |
| from. |
| |
| `whence` indicates some information about what to do with the parents of the |
| specified object. We'll explore this flag more later on; take a look at |
| `Documentation/revisions.txt` to get an idea of what could set the `whence` |
| value. |
| |
| `flags` are used to hint the beginning of the revision walk and are the first |
| block under the `#include`s in `revision.h`. The most likely ones to be set in |
| the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags |
| can be used during the walk, as well. |
| |
| === `struct rev_info` |
| |
| This one is quite a bit longer, and many fields are only used during the walk |
| by `revision.c` - not configuration options. Most of the configurable flags in |
| `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a |
| good idea to take some time and read through that document. |
| |
| == Basic Commit Walk |
| |
| First, let's see if we can replicate the output of `git log --oneline`. We'll |
| refer back to the implementation frequently to discover norms when performing |
| an object walk of our own. |
| |
| To do so, we'll first find all the commits, in order, which preceded the current |
| commit. We'll extract the name and subject of the commit from each. |
| |
| Ideally, we will also be able to find out which ones are currently at the tip of |
| various branches. |
| |
| === Setting Up |
| |
| Preparing for your object walk has some distinct stages. |
| |
| 1. Perform default setup for this mode, and others which may be invoked. |
| 2. Check configuration files for relevant settings. |
| 3. Set up the `rev_info` struct. |
| 4. Tweak the initialized `rev_info` to suit the current walk. |
| 5. Prepare the `rev_info` for the walk. |
| 6. Iterate over the objects, processing each one. |
| |
| ==== Default Setups |
| |
| Before examining configuration files which may modify command behavior, set up |
| default state for switches or options your command may have. If your command |
| utilizes other Git components, ask them to set up their default states as well. |
| For instance, `git log` takes advantage of `grep` and `diff` functionality, so |
| its `init_log_defaults()` sets its own state (`decoration_style`) and asks |
| `grep` and `diff` to initialize themselves by calling each of their |
| initialization functions. |
| |
| For our first example within `git walken`, we don't intend to use any other |
| components within Git, and we don't have any configuration to do. However, we |
| may want to add some later, so for now, we can add an empty placeholder. Create |
| a new function in `builtin/walken.c`: |
| |
| ---- |
| static void init_walken_defaults(void) |
| { |
| /* |
| * We don't actually need the same components `git log` does; leave this |
| * empty for now. |
| */ |
| } |
| ---- |
| |
| Make sure to add a line invoking it inside of `cmd_walken()`. |
| |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| init_walken_defaults(); |
| } |
| ---- |
| |
| ==== Configuring From `.gitconfig` |
| |
| Next, we should have a look at any relevant configuration settings (i.e., |
| settings readable and settable from `git config`). This is done by providing a |
| callback to `git_config()`; within that callback, you can also invoke methods |
| from other components you may need that need to intercept these options. Your |
| callback will be invoked once per each configuration value which Git knows about |
| (global, local, worktree, etc.). |
| |
| Similarly to the default values, we don't have anything to do here yet |
| ourselves; however, we should call `git_default_config()` if we aren't calling |
| any other existing config callbacks. |
| |
| Add a new function to `builtin/walken.c`: |
| |
| ---- |
| static int git_walken_config(const char *var, const char *value, void *cb) |
| { |
| /* |
| * For now, we don't have any custom configuration, so fall back to |
| * the default config. |
| */ |
| return git_default_config(var, value, cb); |
| } |
| ---- |
| |
| Make sure to invoke `git_config()` with it in your `cmd_walken()`: |
| |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| ... |
| |
| git_config(git_walken_config, NULL); |
| |
| ... |
| } |
| ---- |
| |
| ==== Setting Up `rev_info` |
| |
| Now that we've gathered external configuration and options, it's time to |
| initialize the `rev_info` object which we will use to perform the walk. This is |
| typically done by calling `repo_init_revisions()` with the repository you intend |
| to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info` |
| struct. |
| |
| Add the `struct rev_info` and the `repo_init_revisions()` call: |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| /* This can go wherever you like in your declarations.*/ |
| struct rev_info rev; |
| ... |
| |
| /* This should go after the git_config() call. */ |
| repo_init_revisions(the_repository, &rev, prefix); |
| |
| ... |
| } |
| ---- |
| |
| ==== Tweaking `rev_info` For the Walk |
| |
| We're getting close, but we're still not quite ready to go. Now that `rev` is |
| initialized, we can modify it to fit our needs. This is usually done within a |
| helper for clarity, so let's add one: |
| |
| ---- |
| static void final_rev_info_setup(struct rev_info *rev) |
| { |
| /* |
| * We want to mimic the appearance of `git log --oneline`, so let's |
| * force oneline format. |
| */ |
| get_commit_format("oneline", rev); |
| |
| /* Start our object walk at HEAD. */ |
| add_head_to_pending(rev); |
| } |
| ---- |
| |
| [NOTE] |
| ==== |
| Instead of using the shorthand `add_head_to_pending()`, you could do |
| something like this: |
| ---- |
| struct setup_revision_opt opt; |
| |
| memset(&opt, 0, sizeof(opt)); |
| opt.def = "HEAD"; |
| opt.revarg_opt = REVARG_COMMITTISH; |
| setup_revisions(argc, argv, rev, &opt); |
| ---- |
| Using a `setup_revision_opt` gives you finer control over your walk's starting |
| point. |
| ==== |
| |
| Then let's invoke `final_rev_info_setup()` after the call to |
| `repo_init_revisions()`: |
| |
| ---- |
| int cmd_walken(int argc, const char **argv, const char *prefix) |
| { |
| ... |
| |
| final_rev_info_setup(&rev); |
| |
| ... |
| } |
| ---- |
| |
| Later, we may wish to add more arguments to `final_rev_info_setup()`. But for |
| now, this is all we need. |
| |
| ==== Preparing `rev_info` For the Walk |
| |
| Now that `rev` is all initialized and configured, we've got one more setup step |
| before we get rolling. We can do this in a helper, which will both prepare the |
| `rev_info` for the walk, and perform the walk itself. Let's start the helper |
| with the call to `prepare_revision_walk()`, which can return an error without |
| dying on its own: |
| |
| ---- |
| static void walken_commit_walk(struct rev_info *rev) |
| { |
| if (prepare_revision_walk(rev)) |
| die(_("revision walk setup failed")); |
| } |
| ---- |
| |
| NOTE: `die()` prints to `stderr` and exits the program. Since it will print to |
| `stderr` it's likely to be seen by a human, so we will localize it. |
| |
| ==== Performing the Walk! |
| |
| Finally! We are ready to begin the walk itself. Now we can see that `rev_info` |
| can also be used as an iterator; we move to the next item in the walk by using |
| `get_revision()` repeatedly. Add the listed variable declarations at the top and |
| the walk loop below the `prepare_revision_walk()` call within your |
| `walken_commit_walk()`: |
| |
| ---- |
| static void walken_commit_walk(struct rev_info *rev) |
| { |
| struct commit *commit; |
| struct strbuf prettybuf = STRBUF_INIT; |
| |
| ... |
| |
| while ((commit = get_revision(rev))) { |
| if (!commit) |
| continue; |
| |
| strbuf_reset(&prettybuf); |
| pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); |
| puts(prettybuf.buf); |
| } |
| strbuf_release(&prettybuf); |
| } |
| ---- |
| |
| NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the |
| command we expect to be machine-parsed, we're sending it directly to stdout. |
| |
| Give it a shot. |
| |
| ---- |
| $ make |
| $ ./bin-wrappers/git walken |
| ---- |
| |
| You should see all of the subject lines of all the commits in |
| your tree's history, in order, ending with the initial commit, "Initial revision |
| of "git", the information manager from hell". Congratulations! You've written |
| your first revision walk. You can play with printing some additional fields |
| from each commit if you're curious; have a look at the functions available in |
| `commit.h`. |
| |
| === Adding a Filter |
| |
| Next, let's try to filter the commits we see based on their author. This is |
| equivalent to running `git log --author=<pattern>`. We can add a filter by |
| modifying `rev_info.grep_filter`, which is a `struct grep_opt`. |
| |
| First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add |
| `grep_config()` to `git_walken_config()`: |
| |
| ---- |
| static void init_walken_defaults(void) |
| { |
| init_grep_defaults(the_repository); |
| } |
| |
| ... |
| |
| static int git_walken_config(const char *var, const char *value, void *cb) |
| { |
| grep_config(var, value, cb); |
| return git_default_config(var, value, cb); |
| } |
| ---- |
| |
| Next, we can modify the `grep_filter`. This is done with convenience functions |
| found in `grep.h`. For fun, we're filtering to only commits from folks using a |
| `gmail.com` email address - a not-very-precise guess at who may be working on |
| Git as a hobby. Since we're checking the author, which is a specific line in the |
| header, we'll use the `append_header_grep_pattern()` helper. We can use |
| the `enum grep_header_field` to indicate which part of the commit header we want |
| to search. |
| |
| In `final_rev_info_setup()`, add your filter line: |
| |
| ---- |
| static void final_rev_info_setup(int argc, const char **argv, |
| const char *prefix, struct rev_info *rev) |
| { |
| ... |
| |
| append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, |
| "gmail"); |
| compile_grep_patterns(&rev->grep_filter); |
| |
| ... |
| } |
| ---- |
| |
| `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but |
| it won't work unless we compile it with `compile_grep_patterns()`. |
| |
| NOTE: If you are using `setup_revisions()` (for example, if you are passing a |
| `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need |
| to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. |
| |
| NOTE: We could add the same filter via the `append_grep_pattern()` helper if we |
| wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and |
| `enum grep_pat_token` for us. |
| |
| === Changing the Order |
| |
| There are a few ways that we can change the order of the commits during a |
| revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some |
| typical orderings. |
| |
| `topo_order` is the same as `git log --topo-order`: we avoid showing a parent |
| before all of its children have been shown, and we avoid mixing commits which |
| are in different lines of history. (`git help log`'s section on `--topo-order` |
| has a very nice diagram to illustrate this.) |
| |
| Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to |
| `REV_SORT_BY_AUTHOR_DATE`. Add the following: |
| |
| ---- |
| static void final_rev_info_setup(int argc, const char **argv, |
| const char *prefix, struct rev_info *rev) |
| { |
| ... |
| |
| rev->topo_order = 1; |
| rev->sort_order = REV_SORT_BY_COMMIT_DATE; |
| |
| ... |
| } |
| ---- |
| |
| Let's output this into a file so we can easily diff it with the walk sorted by |
| author date. |
| |
| ---- |
| $ make |
| $ ./bin-wrappers/git walken > commit-date.txt |
| ---- |
| |
| Then, let's sort by author date and run it again. |
| |
| ---- |
| static void final_rev_info_setup(int argc, const char **argv, |
| const char *prefix, struct rev_info *rev) |
| { |
| ... |
| |
| rev->topo_order = 1; |
| rev->sort_order = REV_SORT_BY_AUTHOR_DATE; |
| |
| ... |
| } |
| ---- |
| |
| ---- |
| $ make |
| $ ./bin-wrappers/git walken > author-date.txt |
| ---- |
| |
| Finally, compare the two. This is a little less helpful without object names or |
| dates, but hopefully we get the idea. |
| |
| ---- |
| $ diff -u commit-date.txt author-date.txt |
| ---- |
| |
| This display indicates that commits can be reordered after they're written, for |
| example with `git rebase`. |
| |
| Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. |
| Set that flag somewhere inside of `final_rev_info_setup()`: |
| |
| ---- |
| static void final_rev_info_setup(int argc, const char **argv, const char *prefix, |
| struct rev_info *rev) |
| { |
| ... |
| |
| rev->reverse = 1; |
| |
| ... |
| } |
| ---- |
| |
| Run your walk again and note the difference in order. (If you remove the grep |
| pattern, you should see the last commit this call gives you as your current |
| HEAD.) |
| |
| == Basic Object Walk |
| |
| So far we've been walking only commits. But Git has more types of objects than |
| that! Let's see if we can walk _all_ objects, and find out some information |
| about each one. |
| |
| We can base our work on an example. `git pack-objects` prepares all kinds of |
| objects for packing into a bitmap or packfile. The work we are interested in |
| resides in `builtins/pack-objects.c:get_object_list()`; examination of that |
| function shows that the all-object walk is being performed by |
| `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two |
| functions reside in `list-objects.c`; examining the source shows that, despite |
| the name, these functions traverse all kinds of objects. Let's have a look at |
| the arguments to `traverse_commit_list_filtered()`, which are a superset of the |
| arguments to the unfiltered version. |
| |
| - `struct list_objects_filter_options *filter_options`: This is a struct which |
| stores a filter-spec as outlined in `Documentation/rev-list-options.txt`. |
| - `struct rev_info *revs`: This is the `rev_info` used for the walk. |
| - `show_commit_fn show_commit`: A callback which will be used to handle each |
| individual commit object. |
| - `show_object_fn show_object`: A callback which will be used to handle each |
| non-commit object (so each blob, tree, or tag). |
| - `void *show_data`: A context buffer which is passed in turn to `show_commit` |
| and `show_object`. |
| - `struct oidset *omitted`: A linked-list of object IDs which the provided |
| filter caused to be omitted. |
| |
| It looks like this `traverse_commit_list_filtered()` uses callbacks we provide |
| instead of needing us to call it repeatedly ourselves. Cool! Let's add the |
| callbacks first. |
| |
| For the sake of this tutorial, we'll simply keep track of how many of each kind |
| of object we find. At file scope in `builtin/walken.c` add the following |
| tracking variables: |
| |
| ---- |
| static int commit_count; |
| static int tag_count; |
| static int blob_count; |
| static int tree_count; |
| ---- |
| |
| Commits are handled by a different callback than other objects; let's do that |
| one first: |
| |
| ---- |
| static void walken_show_commit(struct commit *cmt, void *buf) |
| { |
| commit_count++; |
| } |
| ---- |
| |
| The `cmt` argument is fairly self-explanatory. But it's worth mentioning that |
| the `buf` argument is actually the context buffer that we can provide to the |
| traversal calls - `show_data`, which we mentioned a moment ago. |
| |
| Since we have the `struct commit` object, we can look at all the same parts that |
| we looked at in our earlier commit-only walk. For the sake of this tutorial, |
| though, we'll just increment the commit counter and move on. |
| |
| The callback for non-commits is a little different, as we'll need to check |
| which kind of object we're dealing with: |
| |
| ---- |
| static void walken_show_object(struct object *obj, const char *str, void *buf) |
| { |
| switch (obj->type) { |
| case OBJ_TREE: |
| tree_count++; |
| break; |
| case OBJ_BLOB: |
| blob_count++; |
| break; |
| case OBJ_TAG: |
| tag_count++; |
| break; |
| case OBJ_COMMIT: |
| BUG("unexpected commit object in walken_show_object\n"); |
| default: |
| BUG("unexpected object type %s in walken_show_object\n", |
| type_name(obj->type)); |
| } |
| } |
| ---- |
| |
| Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same |
| context pointer that `walken_show_commit()` receives: the `show_data` argument |
| to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally, |
| `str` contains the name of the object, which ends up being something like |
| `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag). |
| |
| To help assure us that we aren't double-counting commits, we'll include some |
| complaining if a commit object is routed through our non-commit callback; we'll |
| also complain if we see an invalid object type. Since those two cases should be |
| unreachable, and would only change in the event of a semantic change to the Git |
| codebase, we complain by using `BUG()` - which is a signal to a developer that |
| the change they made caused unintended consequences, and the rest of the |
| codebase needs to be updated to understand that change. `BUG()` is not intended |
| to be seen by the public, so it is not localized. |
| |
| Our main object walk implementation is substantially different from our commit |
| walk implementation, so let's make a new function to perform the object walk. We |
| can perform setup which is applicable to all objects here, too, to keep separate |
| from setup which is applicable to commit-only walks. |
| |
| We'll start by enabling all types of objects in the `struct rev_info`. We'll |
| also turn on `tree_blobs_in_commit_order`, which means that we will walk a |
| commit's tree and everything it points to immediately after we find each commit, |
| as opposed to waiting for the end and walking through all trees after the commit |
| history has been discovered. With the appropriate settings configured, we are |
| ready to call `prepare_revision_walk()`. |
| |
| ---- |
| static void walken_object_walk(struct rev_info *rev) |
| { |
| rev->tree_objects = 1; |
| rev->blob_objects = 1; |
| rev->tag_objects = 1; |
| rev->tree_blobs_in_commit_order = 1; |
| |
| if (prepare_revision_walk(rev)) |
| die(_("revision walk setup failed")); |
| |
| commit_count = 0; |
| tag_count = 0; |
| blob_count = 0; |
| tree_count = 0; |
| ---- |
| |
| Let's start by calling just the unfiltered walk and reporting our counts. |
| Complete your implementation of `walken_object_walk()`: |
| |
| ---- |
| traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); |
| |
| printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count, |
| blob_count, tag_count, tree_count); |
| } |
| ---- |
| |
| NOTE: This output is intended to be machine-parsed. Therefore, we are not |
| sending it to `trace_printf()`, and we are not localizing it - we need scripts |
| to be able to count on the formatting to be exactly the way it is shown here. |
| If we were intending this output to be read by humans, we would need to localize |
| it with `_()`. |
| |
| Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing |
| command line options is out of scope for this tutorial, so we'll just hardcode |
| a branch we can change at compile time. Where you call `final_rev_info_setup()` |
| and `walken_commit_walk()`, instead branch like so: |
| |
| ---- |
| if (1) { |
| add_head_to_pending(&rev); |
| walken_object_walk(&rev); |
| } else { |
| final_rev_info_setup(argc, argv, prefix, &rev); |
| walken_commit_walk(&rev); |
| } |
| ---- |
| |
| NOTE: For simplicity, we've avoided all the filters and sorts we applied in |
| `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you |
| want, you can certainly use the filters we added before by moving |
| `final_rev_info_setup()` out of the conditional and removing the call to |
| `add_head_to_pending()`. |
| |
| Now we can try to run our command! It should take noticeably longer than the |
| commit walk, but an examination of the output will give you an idea why. Your |
| output should look similar to this example, but with different counts: |
| |
| ---- |
| Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. |
| ---- |
| |
| This makes sense. We have more trees than commits because the Git project has |
| lots of subdirectories which can change, plus at least one tree per commit. We |
| have no tags because we started on a commit (`HEAD`) and while tags can point to |
| commits, commits can't point to tags. |
| |
| NOTE: You will have different counts when you run this yourself! The number of |
| objects grows along with the Git project. |
| |
| === Adding a Filter |
| |
| There are a handful of filters that we can apply to the object walk laid out in |
| `Documentation/rev-list-options.txt`. These filters are typically useful for |
| operations such as creating packfiles or performing a partial clone. They are |
| defined in `list-objects-filter-options.h`. For the purposes of this tutorial we |
| will use the "tree:1" filter, which causes the walk to omit all trees and blobs |
| which are not directly referenced by commits reachable from the commit in |
| `pending` when the walk begins. (`pending` is the list of objects which need to |
| be traversed during a walk; you can imagine a breadth-first tree traversal to |
| help understand. In our case, that means we omit trees and blobs not directly |
| referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only |
| `HEAD` in the `pending` list.) |
| |
| First, we'll need to `#include "list-objects-filter-options.h`" and set up the |
| `struct list_objects_filter_options` at the top of the function. |
| |
| ---- |
| static void walken_object_walk(struct rev_info *rev) |
| { |
| struct list_objects_filter_options filter_options = {}; |
| |
| ... |
| ---- |
| |
| For now, we are not going to track the omitted objects, so we'll replace those |
| parameters with `NULL`. For the sake of simplicity, we'll add a simple |
| build-time branch to use our filter or not. Replace the line calling |
| `traverse_commit_list()` with the following, which will remind us which kind of |
| walk we've just performed: |
| |
| ---- |
| if (0) { |
| /* Unfiltered: */ |
| trace_printf(_("Unfiltered object walk.\n")); |
| traverse_commit_list(rev, walken_show_commit, |
| walken_show_object, NULL); |
| } else { |
| trace_printf( |
| _("Filtered object walk with filterspec 'tree:1'.\n")); |
| parse_list_objects_filter(&filter_options, "tree:1"); |
| |
| traverse_commit_list_filtered(&filter_options, rev, |
| walken_show_commit, walken_show_object, NULL, NULL); |
| } |
| ---- |
| |
| `struct list_objects_filter_options` is usually built directly from a command |
| line argument, so the module provides an easy way to build one from a string. |
| Even though we aren't taking user input right now, we can still build one with |
| a hardcoded string using `parse_list_objects_filter()`. |
| |
| With the filter spec "tree:1", we are expecting to see _only_ the root tree for |
| each commit; therefore, the tree object count should be less than or equal to |
| the number of commits. (For an example of why that's true: `git commit --revert` |
| points to the same tree object as its grandparent.) |
| |
| === Counting Omitted Objects |
| |
| We also have the capability to enumerate all objects which were omitted by a |
| filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking |
| `traverse_commit_list_filtered()` to populate the `omitted` list means that our |
| object walk does not perform any better than an unfiltered object walk; all |
| reachable objects are walked in order to populate the list. |
| |
| First, add the `struct oidset` and related items we will use to iterate it: |
| |
| ---- |
| static void walken_object_walk( |
| ... |
| |
| struct oidset omitted; |
| struct oidset_iter oit; |
| struct object_id *oid = NULL; |
| int omitted_count = 0; |
| oidset_init(&omitted, 0); |
| |
| ... |
| ---- |
| |
| Modify the call to `traverse_commit_list_filtered()` to include your `omitted` |
| object: |
| |
| ---- |
| ... |
| |
| traverse_commit_list_filtered(&filter_options, rev, |
| walken_show_commit, walken_show_object, NULL, &omitted); |
| |
| ... |
| ---- |
| |
| Then, after your traversal, the `oidset` traversal is pretty straightforward. |
| Count all the objects within and modify the print statement: |
| |
| ---- |
| /* Count the omitted objects. */ |
| oidset_iter_init(&omitted, &oit); |
| |
| while ((oid = oidset_iter_next(&oit))) |
| omitted_count++; |
| |
| printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n", |
| commit_count, blob_count, tag_count, tree_count, omitted_count); |
| ---- |
| |
| By running your walk with and without the filter, you should find that the total |
| object count in each case is identical. You can also time each invocation of |
| the `walken` subcommand, with and without `omitted` being passed in, to confirm |
| to yourself the runtime impact of tracking all omitted objects. |
| |
| === Changing the Order |
| |
| Finally, let's demonstrate that you can also reorder walks of all objects, not |
| just walks of commits. First, we'll make our handlers chattier - modify |
| `walken_show_commit()` and `walken_show_object()` to print the object as they |
| go: |
| |
| ---- |
| static void walken_show_commit(struct commit *cmt, void *buf) |
| { |
| trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); |
| commit_count++; |
| } |
| |
| static void walken_show_object(struct object *obj, const char *str, void *buf) |
| { |
| trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); |
| |
| ... |
| } |
| ---- |
| |
| NOTE: Since we will be examining this output directly as humans, we'll use |
| `trace_printf()` here. Additionally, since this change introduces a significant |
| number of printed lines, using `trace_printf()` will allow us to easily silence |
| those lines without having to recompile. |
| |
| (Leave the counter increment logic in place.) |
| |
| With only that change, run again (but save yourself some scrollback): |
| |
| ---- |
| $ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10 |
| ---- |
| |
| Take a look at the top commit with `git show` and the object ID you printed; it |
| should be the same as the output of `git show HEAD`. |
| |
| Next, let's change a setting on our `struct rev_info` within |
| `walken_object_walk()`. Find where you're changing the other settings on `rev`, |
| such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the |
| `reverse` setting at the bottom: |
| |
| ---- |
| ... |
| |
| rev->tree_objects = 1; |
| rev->blob_objects = 1; |
| rev->tag_objects = 1; |
| rev->tree_blobs_in_commit_order = 1; |
| rev->reverse = 1; |
| |
| ... |
| ---- |
| |
| Now, run again, but this time, let's grab the last handful of objects instead |
| of the first handful: |
| |
| ---- |
| $ make |
| $ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10 |
| ---- |
| |
| The last commit object given should have the same OID as the one we saw at the |
| top before, and running `git show <oid>` with that OID should give you again |
| the same results as `git show HEAD`. Furthermore, if you run and examine the |
| first ten lines again (with `head` instead of `tail` like we did before applying |
| the `reverse` setting), you should see that now the first commit printed is the |
| initial commit, `e83c5163`. |
| |
| == Wrapping Up |
| |
| Let's review. In this tutorial, we: |
| |
| - Built a commit walk from the ground up |
| - Enabled a grep filter for that commit walk |
| - Changed the sort order of that filtered commit walk |
| - Built an object walk (tags, commits, trees, and blobs) from the ground up |
| - Learned how to add a filter-spec to an object walk |
| - Changed the display order of the filtered object walk |