sparse-checkout: refactor skip worktree retry logic
The clear_skip_worktree_from_present_files() method was introduced in
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14) to help cases where sparse-checkout is enabled
but some paths outside of the sparse-checkout also exist on disk. This
operation can be slow as it needs to check path existence in a way not
stored in the index, so caching was introduced in d79d299352 (Accelerate
clear_skip_worktree_from_present_files() by caching, 2022-01-14).
This check is particularly confusing in the presence of a sparse index,
as a sparse tree entry corresponding to an existing directory must first
be expanded to a full index before examining the paths within. This is
currently implemented using a 'goto' and a boolean variable to ensure we
restart only once.
Even with that caching, it was noticed that this could take a long time
to execute. 89aaab11a3 (index: add trace2 region for clear skip
worktree, 2022-11-03) introduced trace2 regions to measure this time.
Further, the way the loop repeats itself was slightly confusing and
prone to breakage, so a BUG() statement was added in 8c7abdc596 (index:
raise a bug if the index is materialised more than once, 2022-11-03) to
be sure that the second run of the loop does not hit any sparse trees.
One thing that can be confusing about the current setup is that the
trace2 regions nest and it is not clear that a second loop is running
after a sparse index is expanded. Here is an example of what the regions
look like in a typical case:
| region_enter | ... | label:clear_skip_worktree_from_present_files
| region_enter | ... | ..label:update
| region_leave | ... | ..label:update
| region_enter | ... | ..label:ensure_full_index
| region_enter | ... | ....label:update
| region_leave | ... | ....label:update
| region_leave | ... | ..label:ensure_full_index
| data | ... | ..sparse_path_count:1
| data | ... | ..sparse_path_count_full:269538
| region_leave | ... | label:clear_skip_worktree_from_present_files
One thing that is particularly difficult to understand about these
regions is that most of the time is spent between the close of the
ensure_full_index region and the reporting of the end data. This is
because of the restart of the loop being within the same region as the
first iteration of the loop.
This change refactors the method into two separate methods that are
traced separately. This will be more important later when we change
other features of the methods, but for now the only functional change is
the difference in the structure of the trace regions.
After this change, the same telemetry section is split into three
distinct chunks:
| region_enter | ... | label:clear_skip_worktree_from_present_files_sparse
| data | ... | ..sparse_path_count:1
| region_leave | ... | label:clear_skip_worktree_from_present_files_sparse
| region_enter | ... | label:update
| region_leave | ... | label:update
| region_enter | ... | label:ensure_full_index
| region_enter | ... | ..label:update
| region_leave | ... | ..label:update
| region_leave | ... | label:ensure_full_index
| region_enter | ... | label:clear_skip_worktree_from_present_files_full
| data | ... | ..full_path_count:269538
| region_leave | ... | label:clear_skip_worktree_from_present_files_full
Here, we see the sparse loop terminating early with its first sparse
path being a sparse directory containing a file. Then, that loop's
region terminates before ensure_full_index begins (in this case, the
cache-tree must also be computed). Then, _after_ the index is expanded,
the full loop begins with its own region.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff --git a/sparse-index.c b/sparse-index.c
index e48e40c..e0457c8 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -486,49 +486,78 @@
return 0;
}
-void clear_skip_worktree_from_present_files(struct index_state *istate)
+static int clear_skip_worktree_from_present_files_sparse(struct index_state *istate)
{
const char *last_dirname = NULL;
size_t dir_len = 0;
int dir_found = 1;
- int i;
- int path_count[2] = {0, 0};
- int restarted = 0;
+ int path_count = 0;
+ int to_restart = 0;
- if (!core_apply_sparse_checkout ||
- sparse_expect_files_outside_of_patterns)
- return;
-
- trace2_region_enter("index", "clear_skip_worktree_from_present_files",
+ trace2_region_enter("index", "clear_skip_worktree_from_present_files_sparse",
istate->repo);
-restart:
- for (i = 0; i < istate->cache_nr; i++) {
+ for (int i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
if (ce_skip_worktree(ce)) {
- path_count[restarted]++;
+ path_count++;
if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
if (S_ISSPARSEDIR(ce->ce_mode)) {
- if (restarted)
- BUG("ensure-full-index did not fully flatten?");
- ensure_full_index(istate);
- restarted = 1;
- goto restart;
+ to_restart = 1;
+ break;
}
ce->ce_flags &= ~CE_SKIP_WORKTREE;
}
}
}
- if (path_count[0])
- trace2_data_intmax("index", istate->repo,
- "sparse_path_count", path_count[0]);
- if (restarted)
- trace2_data_intmax("index", istate->repo,
- "sparse_path_count_full", path_count[1]);
- trace2_region_leave("index", "clear_skip_worktree_from_present_files",
+ trace2_data_intmax("index", istate->repo,
+ "sparse_path_count", path_count);
+ trace2_region_leave("index", "clear_skip_worktree_from_present_files_sparse",
istate->repo);
+ return to_restart;
+}
+
+static void clear_skip_worktree_from_present_files_full(struct index_state *istate)
+{
+ const char *last_dirname = NULL;
+ size_t dir_len = 0;
+ int dir_found = 1;
+
+ int path_count = 0;
+
+ trace2_region_enter("index", "clear_skip_worktree_from_present_files_full",
+ istate->repo);
+ for (int i = 0; i < istate->cache_nr; i++) {
+ struct cache_entry *ce = istate->cache[i];
+
+ if (S_ISSPARSEDIR(ce->ce_mode))
+ BUG("ensure-full-index did not fully flatten?");
+
+ if (ce_skip_worktree(ce)) {
+ path_count++;
+ if (path_found(ce->name, &last_dirname, &dir_len, &dir_found))
+ ce->ce_flags &= ~CE_SKIP_WORKTREE;
+ }
+ }
+
+ trace2_data_intmax("index", istate->repo,
+ "full_path_count", path_count);
+ trace2_region_leave("index", "clear_skip_worktree_from_present_files_full",
+ istate->repo);
+}
+
+void clear_skip_worktree_from_present_files(struct index_state *istate)
+{
+ if (!core_apply_sparse_checkout ||
+ sparse_expect_files_outside_of_patterns)
+ return;
+
+ if (clear_skip_worktree_from_present_files_sparse(istate)) {
+ ensure_full_index(istate);
+ clear_skip_worktree_from_present_files_full(istate);
+ }
}
/*