Elijah Newren | 20d87d3 | 2022-11-06 06:04:26 +0000 | [diff] [blame] | 1 | Table of contents: |
| 2 | |
| 3 | * Terminology |
| 4 | * Purpose of sparse-checkouts |
| 5 | * Usecases of primary concern |
| 6 | * Oversimplified mental models ("Cliff Notes" for this document!) |
| 7 | * Desired behavior |
| 8 | * Behavior classes |
| 9 | * Subcommand-dependent defaults |
| 10 | * Sparse specification vs. sparsity patterns |
| 11 | * Implementation Questions |
| 12 | * Implementation Goals/Plans |
| 13 | * Known bugs |
| 14 | * Reference Emails |
| 15 | |
| 16 | |
| 17 | === Terminology === |
| 18 | |
| 19 | cone mode: one of two modes for specifying the desired subset of files |
| 20 | in a sparse-checkout. In cone-mode, the user specifies |
| 21 | directories (getting both everything under that directory as |
| 22 | well as everything in leading directories), while in non-cone |
| 23 | mode, the user specifies gitignore-style patterns. Controlled |
| 24 | by the --[no-]cone option to sparse-checkout init|set. |
| 25 | |
| 26 | SKIP_WORKTREE: When tracked files do not match the sparse specification and |
| 27 | are removed from the working tree, the file in the index is marked |
| 28 | with a SKIP_WORKTREE bit. Note that if a tracked file has the |
| 29 | SKIP_WORKTREE bit set but the file is later written by the user to |
| 30 | the working tree anyway, the SKIP_WORKTREE bit will be cleared at |
| 31 | the beginning of any subsequent Git operation. |
| 32 | |
| 33 | Most sparse checkout users are unaware of this implementation |
| 34 | detail, and the term should generally be avoided in user-facing |
| 35 | descriptions and command flags. Unfortunately, prior to the |
| 36 | `sparse-checkout` subcommand this low-level detail was exposed, |
| 37 | and as of time of writing, is still exposed in various places. |
| 38 | |
| 39 | sparse-checkout: a subcommand in git used to reduce the files present in |
| 40 | the working tree to a subset of all tracked files. Also, the |
| 41 | name of the file in the $GIT_DIR/info directory used to track |
| 42 | the sparsity patterns corresponding to the user's desired |
| 43 | subset. |
| 44 | |
| 45 | sparse cone: see cone mode |
| 46 | |
| 47 | sparse directory: An entry in the index corresponding to a directory, which |
| 48 | appears in the index instead of all the files under that directory |
| 49 | that would normally appear. See also sparse-index. Something that |
| 50 | can cause confusion is that the "sparse directory" does NOT match |
| 51 | the sparse specification, i.e. the directory is NOT present in the |
| 52 | working tree. May be renamed in the future (e.g. to "skipped |
| 53 | directory"). |
| 54 | |
| 55 | sparse index: A special mode for sparse-checkout that also makes the |
| 56 | index sparse by recording a directory entry in lieu of all the |
| 57 | files underneath that directory (thus making that a "skipped |
| 58 | directory" which unfortunately has also been called a "sparse |
| 59 | directory"), and does this for potentially multiple |
| 60 | directories. Controlled by the --[no-]sparse-index option to |
| 61 | init|set|reapply. |
| 62 | |
| 63 | sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to |
| 64 | define the set of files of interest. A warning: It is easy to |
| 65 | over-use this term (or the shortened "patterns" term), for two |
| 66 | reasons: (1) users in cone mode specify directories rather than |
| 67 | patterns (their directories are transformed into patterns, but |
| 68 | users may think you are talking about non-cone mode if you use the |
| 69 | word "patterns"), and (b) the sparse specification might |
| 70 | transiently differ in the working tree or index from the sparsity |
| 71 | patterns (see "Sparse specification vs. sparsity patterns"). |
| 72 | |
| 73 | sparse specification: The set of paths in the user's area of focus. This |
| 74 | is typically just the tracked files that match the sparsity |
| 75 | patterns, but the sparse specification can temporarily differ and |
| 76 | include additional files. (See also "Sparse specification |
| 77 | vs. sparsity patterns") |
| 78 | |
| 79 | * When working with history, the sparse specification is exactly |
| 80 | the set of files matching the sparsity patterns. |
| 81 | * When interacting with the working tree, the sparse specification |
| 82 | is the set of tracked files with a clear SKIP_WORKTREE bit or |
| 83 | tracked files present in the working copy. |
| 84 | * When modifying or showing results from the index, the sparse |
| 85 | specification is the set of files with a clear SKIP_WORKTREE bit |
| 86 | or that differ in the index from HEAD. |
| 87 | * If working with the index and the working copy, the sparse |
| 88 | specification is the union of the paths from above. |
| 89 | |
| 90 | vivifying: When a command restores a tracked file to the working tree (and |
| 91 | hopefully also clears the SKIP_WORKTREE bit in the index for that |
| 92 | file), this is referred to as "vivifying" the file. |
| 93 | |
| 94 | |
| 95 | === Purpose of sparse-checkouts === |
| 96 | |
| 97 | sparse-checkouts exist to allow users to work with a subset of their |
| 98 | files. |
| 99 | |
| 100 | You can think of sparse-checkouts as subdividing "tracked" files into two |
| 101 | categories -- a sparse subset, and all the rest. Implementationally, we |
| 102 | mark "all the rest" in the index with a SKIP_WORKTREE bit and leave them |
| 103 | out of the working tree. The SKIP_WORKTREE files are still tracked, just |
| 104 | not present in the working tree. |
| 105 | |
| 106 | In the past, sparse-checkouts were defined by "SKIP_WORKTREE means the file |
| 107 | is missing from the working tree but pretend the file contents match HEAD". |
| 108 | That was not only bogus (it actually meant the file missing from the |
| 109 | working tree matched the index rather than HEAD), but it was also a |
| 110 | low-level detail which only provided decent behavior for a few commands. |
| 111 | There were a surprising number of ways in which that guiding principle gave |
| 112 | command results that violated user expectations, and as such was a bad |
| 113 | mental model. However, it persisted for many years and may still be found |
| 114 | in some corners of the code base. |
| 115 | |
| 116 | Anyway, the idea of "working with a subset of files" is simple enough, but |
| 117 | there are multiple different high-level usecases which affect how some Git |
| 118 | subcommands should behave. Further, even if we only considered one of |
| 119 | those usecases, sparse-checkouts can modify different subcommands in over a |
| 120 | half dozen different ways. Let's start by considering the high level |
| 121 | usecases: |
| 122 | |
| 123 | A) Users are _only_ interested in the sparse portion of the repo |
| 124 | |
| 125 | A*) Users are _only_ interested in the sparse portion of the repo |
| 126 | that they have downloaded so far |
| 127 | |
| 128 | B) Users want a sparse working tree, but are working in a larger whole |
| 129 | |
| 130 | C) sparse-checkout is a behind-the-scenes implementation detail allowing |
| 131 | Git to work with a specially crafted in-house virtual file system; |
| 132 | users are actually working with a "full" working tree that is |
| 133 | lazily populated, and sparse-checkout helps with the lazy population |
| 134 | piece. |
| 135 | |
| 136 | It may be worth explaining each of these in a bit more detail: |
| 137 | |
| 138 | |
| 139 | (Behavior A) Users are _only_ interested in the sparse portion of the repo |
| 140 | |
| 141 | These folks might know there are other things in the repository, but |
| 142 | don't care. They are uninterested in other parts of the repository, and |
| 143 | only want to know about changes within their area of interest. Showing |
| 144 | them other files from history (e.g. from diff/log/grep/etc.) is a |
| 145 | usability annoyance, potentially a huge one since other changes in |
| 146 | history may dwarf the changes they are interested in. |
| 147 | |
| 148 | Some of these users also arrive at this usecase from wanting to use partial |
| 149 | clones together with sparse checkouts (in a way where they have downloaded |
| 150 | blobs within the sparse specification) and do disconnected development. |
| 151 | Not only do these users generally not care about other parts of the |
| 152 | repository, but consider it a blocker for Git commands to try to operate on |
| 153 | those. If commands attempt to access paths in history outside the sparsity |
| 154 | specification, then the partial clone will attempt to download additional |
| 155 | blobs on demand, fail, and then fail the user's command. (This may be |
| 156 | unavoidable in some cases, e.g. when `git merge` has non-trivial changes to |
| 157 | reconcile outside the sparse specification, but we should limit how often |
| 158 | users are forced to connect to the network.) |
| 159 | |
| 160 | Also, even for users using partial clones that do not mind being |
| 161 | always connected to the network, the need to download blobs as |
| 162 | side-effects of various other commands (such as the printed diffstat |
| 163 | after a merge or pull) can lead to worries about local repository size |
| 164 | growing unnecessarily[10]. |
| 165 | |
| 166 | (Behavior A*) Users are _only_ interested in the sparse portion of the repo |
| 167 | that they have downloaded so far (a variant on the first usecase) |
| 168 | |
| 169 | This variant is driven by folks who using partial clones together with |
| 170 | sparse checkouts and do disconnected development (so far sounding like a |
| 171 | subset of behavior A users) and doing so on very large repositories. The |
| 172 | reason for yet another variant is that downloading even just the blobs |
| 173 | through history within their sparse specification may be too much, so they |
| 174 | only download some. They would still like operations to succeed without |
| 175 | network connectivity, though, so things like `git log -S${SEARCH_TERM} -p` |
| 176 | or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide |
| 177 | partial results that depend on what happens to have been downloaded. |
| 178 | |
| 179 | This variant could be viewed as Behavior A with the sparse specification |
| 180 | for history querying operations modified from "sparsity patterns" to |
| 181 | "sparsity patterns limited to the blobs we have already downloaded". |
| 182 | |
| 183 | (Behavior B) Users want a sparse working tree, but are working in a |
| 184 | larger whole |
| 185 | |
| 186 | Stolee described this usecase this way[11]: |
| 187 | |
| 188 | "I'm also focused on users that know that they are a part of a larger |
| 189 | whole. They know they are operating on a large repository but focus on |
| 190 | what they need to contribute their part. I expect multiple "roles" to |
| 191 | use very different, almost disjoint parts of the codebase. Some other |
| 192 | "architect" users operate across the entire tree or hop between different |
| 193 | sections of the codebase as necessary. In this situation, I'm wary of |
| 194 | scoping too many features to the sparse-checkout definition, especially |
| 195 | "git log," as it can be too confusing to have their view of the codebase |
| 196 | depend on your "point of view." |
| 197 | |
| 198 | People might also end up wanting behavior B due to complex inter-project |
| 199 | dependencies. The initial attempts to use sparse-checkouts usually involve |
| 200 | the directories you are directly interested in plus what those directories |
| 201 | depend upon within your repository. But there's a monkey wrench here: if |
| 202 | you have integration tests, they invert the hierarchy: to run integration |
| 203 | tests, you need not only what you are interested in and its in-tree |
| 204 | dependencies, you also need everything that depends upon what you are |
| 205 | interested in or that depends upon one of your dependencies...AND you need |
| 206 | all the in-tree dependencies of that expanded group. That can easily |
| 207 | change your sparse-checkout into a nearly dense one. |
| 208 | |
| 209 | Naturally, that tends to kill the benefits of sparse-checkouts. There are |
| 210 | a couple solutions to this conundrum: either avoid grabbing in-repo |
| 211 | dependencies (maybe have built versions of your in-repo dependencies pulled |
| 212 | from a CI cache somewhere), or say that users shouldn't run integration |
| 213 | tests directly and instead do it on the CI server when they submit a code |
| 214 | review. Or do both. Regardless of whether you stub out your in-repo |
| 215 | dependencies or stub out the things that depend upon you, there is |
| 216 | certainly a reason to want to query and be aware of those other stubbed-out |
| 217 | parts of the repository, particularly when the dependencies are complex or |
| 218 | change relatively frequently. Thus, for such uses, sparse-checkouts can be |
| 219 | used to limit what you directly build and modify, but these users do not |
| 220 | necessarily want their sparse checkout paths to limit their queries of |
| 221 | versions in history. |
| 222 | |
| 223 | Some people may also be interested in behavior B over behavior A simply as |
| 224 | a performance workaround: if they are using non-cone mode, then they have |
| 225 | to deal with its inherent quadratic performance problems. In that mode, |
| 226 | every operation that checks whether paths match the sparsity specification |
| 227 | can be expensive. As such, these users may only be willing to pay for |
| 228 | those expensive checks when interacting with the working copy, and may |
| 229 | prefer getting "unrelated" results from their history queries over having |
| 230 | slow commands. |
| 231 | |
| 232 | (Behavior C) sparse-checkout is an implementational detail supporting a |
| 233 | special VFS. |
| 234 | |
| 235 | This usecase goes slightly against the traditional definition of |
| 236 | sparse-checkout in that it actually tries to present a full or dense |
| 237 | checkout to the user. However, this usecase utilizes the same underlying |
| 238 | technical underpinnings in a new way which does provide some performance |
| 239 | advantages to users. The basic idea is that a company can have an in-house |
| 240 | Git-aware Virtual File System which pretends all files are present in the |
| 241 | working tree, by intercepting all file system accesses and using those to |
| 242 | fetch and write accessed files on demand via partial clones. The VFS uses |
| 243 | sparse-checkout to prevent Git from writing or paying attention to many |
| 244 | files, and manually updates the sparse checkout patterns itself based on |
| 245 | user access and modification of files in the working tree. See commit |
| 246 | ecc7c8841d ("repo_read_index: add config to expect files outside sparse |
| 247 | patterns", 2022-02-25) and the link at [17] for a more detailed description |
| 248 | of such a VFS. |
| 249 | |
| 250 | The biggest difference here is that users are completely unaware that the |
| 251 | sparse-checkout machinery is even in use. The sparse patterns are not |
| 252 | specified by the user but rather are under the complete control of the VFS |
| 253 | (and the patterns are updated frequently and dynamically by it). The user |
| 254 | will perceive the checkout as dense, and commands should thus behave as if |
| 255 | all files are present. |
| 256 | |
| 257 | |
| 258 | === Usecases of primary concern === |
| 259 | |
| 260 | Most of the rest of this document will focus on Behavior A and Behavior |
| 261 | B. Some notes about the other two cases and why we are not focusing on |
| 262 | them: |
| 263 | |
| 264 | (Behavior A*) |
| 265 | |
| 266 | Supporting this usecase is estimated to be difficult and a lot of work. |
| 267 | There are no plans to implement it currently, but it may be a potential |
| 268 | future alternative. Knowing about the existence of additional alternatives |
| 269 | may affect our choice of command line flags (e.g. if we need tri-state or |
| 270 | quad-state flags rather than just binary flags), so it was still important |
| 271 | to at least note. |
| 272 | |
| 273 | Further, I believe the descriptions below for Behavior A are probably still |
| 274 | valid for this usecase, with the only exception being that it redefines the |
| 275 | sparse specification to restrict it to already-downloaded blobs. The hard |
| 276 | part is in making commands capable of respecting that modified definition. |
| 277 | |
| 278 | (Behavior C) |
| 279 | |
| 280 | This usecase violates some of the early sparse-checkout documented |
| 281 | assumptions (since files marked as SKIP_WORKTREE will be displayed to users |
| 282 | as present in the working tree). That violation may mean various |
| 283 | sparse-checkout related behaviors are not well suited to this usecase and |
| 284 | we may need tweaks -- to both documentation and code -- to handle it. |
| 285 | However, this usecase is also perhaps the simplest model to support in that |
| 286 | everything behaves like a dense checkout with a few exceptions (e.g. branch |
| 287 | checkouts and switches write fewer things, knowing the VFS will lazily |
| 288 | write the rest on an as-needed basis). |
| 289 | |
Andrew Kreimer | 98398f3 | 2024-09-20 11:28:13 +0300 | [diff] [blame] | 290 | Since there is no publicly available VFS-related code for folks to try, |
Elijah Newren | 20d87d3 | 2022-11-06 06:04:26 +0000 | [diff] [blame] | 291 | the number of folks who can test such a usecase is limited. |
| 292 | |
| 293 | The primary reason to note the Behavior C usecase is that as we fix things |
| 294 | to better support Behaviors A and B, there may be additional places where |
| 295 | we need to make tweaks allowing folks in this usecase to get the original |
| 296 | non-sparse treatment. For an example, see ecc7c8841d ("repo_read_index: |
| 297 | add config to expect files outside sparse patterns", 2022-02-25). The |
| 298 | secondary reason to note Behavior C, is so that folks taking advantage of |
| 299 | Behavior C do not assume they are part of the Behavior B camp and propose |
| 300 | patches that break things for the real Behavior B folks. |
| 301 | |
| 302 | |
| 303 | === Oversimplified mental models === |
| 304 | |
| 305 | An oversimplification of the differences in the above behaviors is: |
| 306 | |
| 307 | Behavior A: Restrict worktree and history operations to sparse specification |
| 308 | Behavior B: Restrict worktree operations to sparse specification; have any |
| 309 | history operations work across all files |
| 310 | Behavior C: Do not restrict either worktree or history operations to the |
| 311 | sparse specification...with the exception of branch checkouts or |
| 312 | switches which avoid writing files that will match the index so |
| 313 | they can later lazily be populated instead. |
| 314 | |
| 315 | |
| 316 | === Desired behavior === |
| 317 | |
| 318 | As noted previously, despite the simple idea of just working with a subset |
| 319 | of files, there are a range of different behavioral changes that need to be |
| 320 | made to different subcommands to work well with such a feature. See |
| 321 | [1,2,3,4,5,6,7,8,9,10] for various examples. In particular, at [2], we saw |
| 322 | that mere composition of other commands that individually worked correctly |
| 323 | in a sparse-checkout context did not imply that the higher level command |
| 324 | would work correctly; it sometimes requires further tweaks. So, |
| 325 | understanding these differences can be beneficial. |
| 326 | |
| 327 | * Commands behaving the same regardless of high-level use-case |
| 328 | |
| 329 | * commands that only look at files within the sparsity specification |
| 330 | |
| 331 | * diff (without --cached or REVISION arguments) |
| 332 | * grep (without --cached or REVISION arguments) |
| 333 | * diff-files |
| 334 | |
| 335 | * commands that restore files to the working tree that match sparsity |
| 336 | patterns, and remove unmodified files that don't match those |
| 337 | patterns: |
| 338 | |
| 339 | * switch |
| 340 | * checkout (the switch-like half) |
| 341 | * read-tree |
| 342 | * reset --hard |
| 343 | |
| 344 | * commands that write conflicted files to the working tree, but otherwise |
| 345 | will omit writing files to the working tree that do not match the |
| 346 | sparsity patterns: |
| 347 | |
| 348 | * merge |
| 349 | * rebase |
| 350 | * cherry-pick |
| 351 | * revert |
| 352 | |
| 353 | * `am` and `apply --cached` should probably be in this section but |
| 354 | are buggy (see the "Known bugs" section below) |
| 355 | |
| 356 | The behavior for these commands somewhat depends upon the merge |
| 357 | strategy being used: |
| 358 | * `ort` behaves as described above |
Elijah Newren | 20d87d3 | 2022-11-06 06:04:26 +0000 | [diff] [blame] | 359 | * `octopus` and `resolve` will always vivify any file changed in the merge |
| 360 | relative to the first parent, which is rather suboptimal. |
| 361 | |
| 362 | It is also important to note that these commands WILL update the index |
| 363 | outside the sparse specification relative to when the operation began, |
| 364 | BUT these commands often make a commit just before or after such that |
| 365 | by the end of the operation there is no change to the index outside the |
| 366 | sparse specification. Of course, if the operation hits conflicts or |
| 367 | does not make a commit, then these operations clearly can modify the |
| 368 | index outside the sparse specification. |
| 369 | |
| 370 | Finally, it is important to note that at least the first four of these |
| 371 | commands also try to remove differences between the sparse |
| 372 | specification and the sparsity patterns (much like the commands in the |
| 373 | previous section). |
| 374 | |
| 375 | * commands that always ignore sparsity since commits must be full-tree |
| 376 | |
| 377 | * archive |
| 378 | * bundle |
| 379 | * commit |
| 380 | * format-patch |
| 381 | * fast-export |
| 382 | * fast-import |
| 383 | * commit-tree |
| 384 | |
| 385 | * commands that write any modified file to the working tree (conflicted |
| 386 | or not, and whether those paths match sparsity patterns or not): |
| 387 | |
| 388 | * stash |
| 389 | * apply (without `--index` or `--cached`) |
| 390 | |
| 391 | * Commands that may slightly differ for behavior A vs. behavior B: |
| 392 | |
| 393 | Commands in this category behave mostly the same between the two |
| 394 | behaviors, but may differ in verbosity and types of warning and error |
| 395 | messages. |
| 396 | |
| 397 | * commands that make modifications to which files are tracked: |
| 398 | * add |
| 399 | * rm |
| 400 | * mv |
| 401 | * update-index |
| 402 | |
| 403 | The fact that files can move between the 'tracked' and 'untracked' |
| 404 | categories means some commands will have to treat untracked files |
| 405 | differently. But if we have to treat untracked files differently, |
| 406 | then additional commands may also need changes: |
| 407 | |
| 408 | * status |
| 409 | * clean |
| 410 | |
| 411 | In particular, `status` may need to report any untracked files outside |
| 412 | the sparsity specification as an erroneous condition (especially to |
| 413 | avoid the user trying to `git add` them, forcing `git add` to display |
| 414 | an error). |
| 415 | |
| 416 | It's not clear to me exactly how (or even if) `clean` would change, |
| 417 | but it's the other command that also affects untracked files. |
| 418 | |
| 419 | `update-index` may be slightly special. Its --[no-]skip-worktree flag |
| 420 | may need to ignore the sparse specification by its nature. Also, its |
| 421 | current --[no-]ignore-skip-worktree-entries default is totally bogus. |
| 422 | |
| 423 | * commands for manually tweaking paths in both the index and the working tree |
| 424 | * `restore` |
| 425 | * the restore-like half of `checkout` |
| 426 | |
| 427 | These commands should be similar to add/rm/mv in that they should |
| 428 | only operate on the sparse specification by default, and require a |
| 429 | special flag to operate on all files. |
| 430 | |
| 431 | Also, note that these commands currently have a number of issues (see |
| 432 | the "Known bugs" section below) |
| 433 | |
| 434 | * Commands that significantly differ for behavior A vs. behavior B: |
| 435 | |
| 436 | * commands that query history |
| 437 | * diff (with --cached or REVISION arguments) |
| 438 | * grep (with --cached or REVISION arguments) |
| 439 | * show (when given commit arguments) |
| 440 | * blame (only matters when one or more -C flags are passed) |
| 441 | * and annotate |
| 442 | * log |
| 443 | * whatchanged |
| 444 | * ls-files |
| 445 | * diff-index |
| 446 | * diff-tree |
| 447 | * ls-tree |
| 448 | |
| 449 | Note: for log and whatchanged, revision walking logic is unaffected |
| 450 | but displaying of patches is affected by scoping the command to the |
| 451 | sparse-checkout. (The fact that revision walking is unaffected is |
| 452 | why rev-list, shortlog, show-branch, and bisect are not in this |
| 453 | list.) |
| 454 | |
| 455 | ls-files may be slightly special in that e.g. `git ls-files -t` is |
| 456 | often used to see what is sparse and what is not. Perhaps -t should |
| 457 | always work on the full tree? |
| 458 | |
| 459 | * Commands I don't know how to classify |
| 460 | |
| 461 | * range-diff |
| 462 | |
| 463 | Is this like `log` or `format-patch`? |
| 464 | |
| 465 | * cherry |
| 466 | |
| 467 | See range-diff |
| 468 | |
| 469 | * Commands unaffected by sparse-checkouts |
| 470 | |
| 471 | * shortlog |
| 472 | * show-branch |
| 473 | * rev-list |
| 474 | * bisect |
| 475 | |
| 476 | * branch |
| 477 | * describe |
| 478 | * fetch |
| 479 | * gc |
| 480 | * init |
| 481 | * maintenance |
| 482 | * notes |
| 483 | * pull (merge & rebase have the necessary changes) |
| 484 | * push |
| 485 | * submodule |
| 486 | * tag |
| 487 | |
| 488 | * config |
| 489 | * filter-branch (works in separate checkout without sparse-checkout setup) |
| 490 | * pack-refs |
| 491 | * prune |
| 492 | * remote |
| 493 | * repack |
| 494 | * replace |
| 495 | |
| 496 | * bugreport |
| 497 | * count-objects |
| 498 | * fsck |
| 499 | * gitweb |
| 500 | * help |
| 501 | * instaweb |
| 502 | * merge-tree (doesn't touch worktree or index, and merges always compute full-tree) |
| 503 | * rerere |
| 504 | * verify-commit |
| 505 | * verify-tag |
| 506 | |
| 507 | * commit-graph |
| 508 | * hash-object |
| 509 | * index-pack |
| 510 | * mktag |
| 511 | * mktree |
| 512 | * multi-pack-index |
| 513 | * pack-objects |
| 514 | * prune-packed |
| 515 | * symbolic-ref |
| 516 | * unpack-objects |
| 517 | * update-ref |
| 518 | * write-tree (operates on index, possibly optimized to use sparse dir entries) |
| 519 | |
| 520 | * for-each-ref |
| 521 | * get-tar-commit-id |
| 522 | * ls-remote |
| 523 | * merge-base (merges are computed full tree, so merge base should be too) |
| 524 | * name-rev |
| 525 | * pack-redundant |
| 526 | * rev-parse |
| 527 | * show-index |
| 528 | * show-ref |
| 529 | * unpack-file |
| 530 | * var |
| 531 | * verify-pack |
| 532 | |
| 533 | * <Everything under 'Interacting with Others' in 'git help --all'> |
| 534 | * <Everything under 'Low-level...Syncing' in 'git help --all'> |
| 535 | * <Everything under 'Low-level...Internal Helpers' in 'git help --all'> |
| 536 | * <Everything under 'External commands' in 'git help --all'> |
| 537 | |
| 538 | * Commands that might be affected, but who cares? |
| 539 | |
| 540 | * merge-file |
| 541 | * merge-index |
| 542 | * gitk? |
| 543 | |
| 544 | |
| 545 | === Behavior classes === |
| 546 | |
| 547 | From the above there are a few classes of behavior: |
| 548 | |
| 549 | * "restrict" |
| 550 | |
| 551 | Commands in this class only read or write files in the working tree |
| 552 | within the sparse specification. |
| 553 | |
| 554 | When moving to a new commit (e.g. switch, reset --hard), these commands |
| 555 | may update index files outside the sparse specification as of the start |
| 556 | of the operation, but by the end of the operation those index files |
| 557 | will match HEAD again and thus those files will again be outside the |
| 558 | sparse specification. |
| 559 | |
| 560 | When paths are explicitly specified, these paths are intersected with |
| 561 | the sparse specification and will only operate on such paths. |
| 562 | (e.g. `git restore [--staged] -- '*.png'`, `git reset -p -- '*.md'`) |
| 563 | |
| 564 | Some of these commands may also attempt, at the end of their operation, |
| 565 | to cull transient differences between the sparse specification and the |
| 566 | sparsity patterns (see "Sparse specification vs. sparsity patterns" for |
| 567 | details, but this basically means either removing unmodified files not |
| 568 | matching the sparsity patterns and marking those files as |
| 569 | SKIP_WORKTREE, or vivifying files that match the sparsity patterns and |
| 570 | marking those files as !SKIP_WORKTREE). |
| 571 | |
| 572 | * "restrict modulo conflicts" |
| 573 | |
| 574 | Commands in this class generally behave like the "restrict" class, |
| 575 | except that: |
| 576 | (1) they will ignore the sparse specification and write files with |
| 577 | conflicts to the working tree (thus temporarily expanding the |
| 578 | sparse specification to include such files.) |
| 579 | (2) they are grouped with commands which move to a new commit, since |
| 580 | they often create a commit and then move to it, even though we |
| 581 | know there are many exceptions to moving to the new commit. (For |
| 582 | example, the user may rebase a commit that becomes empty, or have |
| 583 | a cherry-pick which conflicts, or a user could run `merge |
| 584 | --no-commit`, and we also view `apply --index` kind of like `am |
| 585 | --no-commit`.) As such, these commands can make changes to index |
| 586 | files outside the sparse specification, though they'll mark such |
| 587 | files with SKIP_WORKTREE. |
| 588 | |
| 589 | * "restrict also specially applied to untracked files" |
| 590 | |
| 591 | Commands in this class generally behave like the "restrict" class, |
| 592 | except that they have to handle untracked files differently too, often |
| 593 | because these commands are dealing with files changing state between |
| 594 | 'tracked' and 'untracked'. Often, this may mean printing an error |
| 595 | message if the command had nothing to do, but the arguments may have |
| 596 | referred to files whose tracked-ness state could have changed were it |
| 597 | not for the sparsity patterns excluding them. |
| 598 | |
| 599 | * "no restrict" |
| 600 | |
| 601 | Commands in this class ignore the sparse specification entirely. |
| 602 | |
| 603 | * "restrict or no restrict dependent upon behavior A vs. behavior B" |
| 604 | |
| 605 | Commands in this class behave like "no restrict" for folks in the |
| 606 | behavior B camp, and like "restrict" for folks in the behavior A camp. |
| 607 | However, when behaving like "restrict" a warning of some sort might be |
| 608 | provided that history queries have been limited by the sparse-checkout |
| 609 | specification. |
| 610 | |
| 611 | |
| 612 | === Subcommand-dependent defaults === |
| 613 | |
| 614 | Note that we have different defaults depending on the command for the |
| 615 | desired behavior : |
| 616 | |
| 617 | * Commands defaulting to "restrict": |
| 618 | * diff-files |
| 619 | * diff (without --cached or REVISION arguments) |
| 620 | * grep (without --cached or REVISION arguments) |
| 621 | * switch |
| 622 | * checkout (the switch-like half) |
| 623 | * reset (<commit>) |
| 624 | |
| 625 | * restore |
| 626 | * checkout (the restore-like half) |
| 627 | * checkout-index |
| 628 | * reset (with pathspec) |
| 629 | |
| 630 | This behavior makes sense; these interact with the working tree. |
| 631 | |
| 632 | * Commands defaulting to "restrict modulo conflicts": |
| 633 | * merge |
| 634 | * rebase |
| 635 | * cherry-pick |
| 636 | * revert |
| 637 | |
| 638 | * am |
| 639 | * apply --index (which is kind of like an `am --no-commit`) |
| 640 | |
| 641 | * read-tree (especially with -m or -u; is kind of like a --no-commit merge) |
| 642 | * reset (<tree-ish>, due to similarity to read-tree) |
| 643 | |
| 644 | These also interact with the working tree, but require slightly |
| 645 | different behavior either so that (a) conflicts can be resolved or (b) |
| 646 | because they are kind of like a merge-without-commit operation. |
| 647 | |
| 648 | (See also the "Known bugs" section below regarding `am` and `apply`) |
| 649 | |
| 650 | * Commands defaulting to "no restrict": |
| 651 | * archive |
| 652 | * bundle |
| 653 | * commit |
| 654 | * format-patch |
| 655 | * fast-export |
| 656 | * fast-import |
| 657 | * commit-tree |
| 658 | |
| 659 | * stash |
| 660 | * apply (without `--index`) |
| 661 | |
| 662 | These have completely different defaults and perhaps deserve the most |
| 663 | detailed explanation: |
| 664 | |
| 665 | In the case of commands in the first group (format-patch, |
| 666 | fast-export, bundle, archive, etc.), these are commands for |
| 667 | communicating history, which will be broken if they restrict to a |
| 668 | subset of the repository. As such, they operate on full paths and |
| 669 | have no `--restrict` option for overriding. Some of these commands may |
| 670 | take paths for manually restricting what is exported, but it needs to |
| 671 | be very explicit. |
| 672 | |
| 673 | In the case of stash, it needs to vivify files to avoid losing the |
| 674 | user's changes. |
| 675 | |
| 676 | In the case of apply without `--index`, that command needs to update |
| 677 | the working tree without the index (or the index without the working |
| 678 | tree if `--cached` is passed), and if we restrict those updates to the |
| 679 | sparse specification then we'll lose changes from the user. |
| 680 | |
| 681 | * Commands defaulting to "restrict also specially applied to untracked files": |
| 682 | * add |
| 683 | * rm |
| 684 | * mv |
| 685 | * update-index |
| 686 | * status |
| 687 | * clean (?) |
| 688 | |
| 689 | Our original implementation for the first three of these commands was |
| 690 | "no restrict", but it had some severe usability issues: |
| 691 | * `git add <somefile>` if honored and outside the sparse |
| 692 | specification, can result in the file randomly disappearing later |
| 693 | when some subsequent command is run (since various commands |
| 694 | automatically clean up unmodified files outside the sparse |
| 695 | specification). |
| 696 | * `git rm '*.jpg'` could very negatively surprise users if it deletes |
| 697 | files outside the range of the user's interest. |
| 698 | * `git mv` has similar surprises when moving into or out of the cone, |
| 699 | so best to restrict by default |
| 700 | |
| 701 | So, we switched `add` and `rm` to default to "restrict", which made |
| 702 | usability problems much less severe and less frequent, but we still got |
| 703 | complaints because commands like: |
| 704 | git add <file-outside-sparse-specification> |
| 705 | git rm <file-outside-sparse-specification> |
| 706 | would silently do nothing. We should instead print an error in those |
| 707 | cases to get usability right. |
| 708 | |
| 709 | update-index needs to be updated to match, and status and maybe clean |
| 710 | also need to be updated to specially handle untracked paths. |
| 711 | |
| 712 | There may be a difference in here between behavior A and behavior B in |
| 713 | terms of verboseness of errors or additional warnings. |
| 714 | |
| 715 | * Commands falling under "restrict or no restrict dependent upon behavior |
| 716 | A vs. behavior B" |
| 717 | |
| 718 | * diff (with --cached or REVISION arguments) |
| 719 | * grep (with --cached or REVISION arguments) |
| 720 | * show (when given commit arguments) |
| 721 | * blame (only matters when one or more -C flags passed) |
| 722 | * and annotate |
| 723 | * log |
| 724 | * and variants: shortlog, gitk, show-branch, whatchanged, rev-list |
| 725 | * ls-files |
| 726 | * diff-index |
| 727 | * diff-tree |
| 728 | * ls-tree |
| 729 | |
| 730 | For now, we default to behavior B for these, which want a default of |
| 731 | "no restrict". |
| 732 | |
| 733 | Note that two of these commands -- diff and grep -- also appeared in a |
| 734 | different list with a default of "restrict", but only when limited to |
| 735 | searching the working tree. The working tree vs. history distinction |
| 736 | is fundamental in how behavior B operates, so this is expected. Note, |
| 737 | though, that for diff and grep with --cached, when doing "restrict" |
| 738 | behavior, the difference between sparse specification and sparsity |
| 739 | patterns is important to handle. |
| 740 | |
| 741 | "restrict" may make more sense as the long term default for these[12]. |
| 742 | Also, supporting "restrict" for these commands might be a fair amount |
| 743 | of work to implement, meaning it might be implemented over multiple |
| 744 | releases. If that behavior were the default in the commands that |
| 745 | supported it, that would force behavior B users to need to learn to |
| 746 | slowly add additional flags to their commands, depending on git |
| 747 | version, to get the behavior they want. That gradual switchover would |
| 748 | be painful, so we should avoid it at least until it's fully |
| 749 | implemented. |
| 750 | |
| 751 | |
| 752 | === Sparse specification vs. sparsity patterns === |
| 753 | |
| 754 | In a well-behaved situation, the sparse specification is given directly |
| 755 | by the $GIT_DIR/info/sparse-checkout file. However, it can transiently |
| 756 | diverge for a few reasons: |
| 757 | |
| 758 | * needing to resolve conflicts (merging will vivify conflicted files) |
| 759 | * running Git commands that implicitly vivify files (e.g. "git stash apply") |
| 760 | * running Git commands that explicitly vivify files (e.g. "git checkout |
| 761 | --ignore-skip-worktree-bits FILENAME") |
| 762 | * other commands that write to these files (perhaps a user copies it |
| 763 | from elsewhere) |
| 764 | |
| 765 | For the last item, note that we do automatically clear the SKIP_WORKTREE |
| 766 | bit for files that are present in the working tree. This has been true |
| 767 | since 82386b4496 ("Merge branch 'en/present-despite-skipped'", |
| 768 | 2022-03-09) |
| 769 | |
| 770 | However, such a situation is transient because: |
| 771 | |
| 772 | * Such transient differences can and will be automatically removed as |
| 773 | a side-effect of commands which call unpack_trees() (checkout, |
| 774 | merge, reset, etc.). |
| 775 | * Users can also request such transient differences be corrected via |
| 776 | running `git sparse-checkout reapply`. Various places recommend |
| 777 | running that command. |
| 778 | * Additional commands are also welcome to implicitly fix these |
| 779 | differences; we may add more in the future. |
| 780 | |
| 781 | While we avoid dropping unstaged changes or files which have conflicts, |
| 782 | we otherwise aggressively try to fix these transient differences. If |
| 783 | users want these differences to persist, they should run the `set` or |
| 784 | `add` subcommands of `git sparse-checkout` to reflect their intended |
| 785 | sparse specification. |
| 786 | |
| 787 | However, when we need to do a query on history restricted to the |
| 788 | "relevant subset of files" such a transiently expanded sparse |
| 789 | specification is ignored. There are a couple reasons for this: |
| 790 | |
| 791 | * The behavior wanted when doing something like |
| 792 | git grep expression REVISION |
| 793 | is roughly what the users would expect from |
| 794 | git checkout REVISION && git grep expression |
| 795 | (modulo a "REVISION:" prefix), which has a couple ramifications: |
| 796 | |
| 797 | * REVISION may have paths not in the current index, so there is no |
| 798 | path we can consult for a SKIP_WORKTREE setting for those paths. |
| 799 | |
| 800 | * Since `checkout` is one of those commands that tries to remove |
| 801 | transient differences in the sparse specification, it makes sense |
| 802 | to use the corrected sparse specification |
| 803 | (i.e. $GIT_DIR/info/sparse-checkout) rather than attempting to |
| 804 | consult SKIP_WORKTREE anyway. |
| 805 | |
| 806 | So, a transiently expanded (or restricted) sparse specification applies to |
| 807 | the working tree, but not to history queries where we always use the |
| 808 | sparsity patterns. (See [16] for an early discussion of this.) |
| 809 | |
| 810 | Similar to a transiently expanded sparse specification of the working tree |
| 811 | based on additional files being present in the working tree, we also need |
| 812 | to consider additional files being modified in the index. In particular, |
| 813 | if the user has staged changes to files (relative to HEAD) that do not |
| 814 | match the sparsity patterns, and the file is not present in the working |
| 815 | tree, we still want to consider the file part of the sparse specification |
| 816 | if we are specifically performing a query related to the index (e.g. git |
| 817 | diff --cached [REVISION], git diff-index [REVISION], git restore --staged |
| 818 | --source=REVISION -- PATHS, etc.) Note that a transiently expanded sparse |
| 819 | specification for the index usually only matters under behavior A, since |
| 820 | under behavior B index operations are lumped with history and tend to |
| 821 | operate full-tree. |
| 822 | |
| 823 | |
| 824 | === Implementation Questions === |
| 825 | |
| 826 | * Do the options --scope={sparse,all} sound good to others? Are there better |
| 827 | options? |
| 828 | * Names in use, or appearing in patches, or previously suggested: |
| 829 | * --sparse/--dense |
| 830 | * --ignore-skip-worktree-bits |
| 831 | * --ignore-skip-worktree-entries |
| 832 | * --ignore-sparsity |
| 833 | * --[no-]restrict-to-sparse-paths |
| 834 | * --full-tree/--sparse-tree |
| 835 | * --[no-]restrict |
| 836 | * --scope={sparse,all} |
| 837 | * --focus/--unfocus |
| 838 | * --limit/--unlimited |
| 839 | * Rationale making me lean slightly towards --scope={sparse,all}: |
| 840 | * We want a name that works for many commands, so we need a name that |
| 841 | does not conflict |
| 842 | * We know that we have more than two possible usecases, so it is best |
| 843 | to avoid a flag that appears to be binary. |
| 844 | * --scope={sparse,all} isn't overly long and seems relatively |
| 845 | explanatory |
| 846 | * `--sparse`, as used in add/rm/mv, is totally backwards for |
| 847 | grep/log/etc. Changing the meaning of `--sparse` for these |
| 848 | commands would fix the backwardness, but possibly break existing |
| 849 | scripts. Using a new name pairing would allow us to treat |
| 850 | `--sparse` in these commands as a deprecated alias. |
| 851 | * There is a different `--sparse`/`--dense` pair for commands using |
| 852 | revision machinery, so using that naming might cause confusion |
| 853 | * There is also a `--sparse` in both pack-objects and show-branch, which |
| 854 | don't conflict but do suggest that `--sparse` is overloaded |
| 855 | * The name --ignore-skip-worktree-bits is a double negative, is |
| 856 | quite a mouthful, refers to an implementation detail that many |
| 857 | users may not be familiar with, and we'd need a negation for it |
| 858 | which would probably be even more ridiculously long. (But we |
| 859 | can make --ignore-skip-worktree-bits a deprecated alias for |
| 860 | --no-restrict.) |
| 861 | |
| 862 | * If a config option is added (sparse.scope?) what should the values and |
| 863 | description be? "sparse" (behavior A), "worktree-sparse-history-dense" |
| 864 | (behavior B), "dense" (behavior C)? There's a risk of confusion, |
| 865 | because even for Behaviors A and B we want some commands to be |
| 866 | full-tree and others to operate sparsely, so the wording may need to be |
| 867 | more tied to the usecases and somehow explain that. Also, right now, |
| 868 | the primary difference we are focusing is just the history-querying |
| 869 | commands (log/diff/grep). Previous config suggestion here: [13] |
| 870 | |
| 871 | * Is `--no-expand` a good alias for ls-files's `--sparse` option? |
| 872 | (`--sparse` does not map to either `--scope=sparse` or `--scope=all`, |
| 873 | because in non-cone mode it does nothing and in cone-mode it shows the |
| 874 | sparse directory entries which are technically outside the sparse |
| 875 | specification) |
| 876 | |
| 877 | * Under Behavior A: |
| 878 | * Does ls-files' `--no-expand` override the default `--scope=all`, or |
| 879 | does it need an extra flag? |
| 880 | * Does ls-files' `-t` option imply `--scope=all`? |
| 881 | * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? |
| 882 | |
| 883 | * sparse-checkout: once behavior A is fully implemented, should we take |
| 884 | an interim measure to ease people into switching the default? Namely, |
| 885 | if folks are not already in a sparse checkout, then require |
| 886 | `sparse-checkout init/set` to take a |
| 887 | `--set-scope=(sparse|worktree-sparse-history-dense|dense)` flag (which |
| 888 | would set sparse.scope according to the setting given), and throw an |
| 889 | error if the flag is not provided? That error would be a great place |
| 890 | to warn folks that the default may change in the future, and get them |
| 891 | used to specifying what they want so that the eventual default switch |
| 892 | is seamless for them. |
| 893 | |
| 894 | |
| 895 | === Implementation Goals/Plans === |
| 896 | |
| 897 | * Get buy-in on this document in general. |
| 898 | |
| 899 | * Figure out answers to the 'Implementation Questions' sections (above) |
| 900 | |
| 901 | * Fix bugs in the 'Known bugs' section (below) |
| 902 | |
| 903 | * Provide some kind of method for backfilling the blobs within the sparse |
| 904 | specification in a partial clone |
| 905 | |
| 906 | [Below here is kind of spitballing since the first two haven't been resolved] |
| 907 | |
| 908 | * update-index: flip the default to --no-ignore-skip-worktree-entries, |
| 909 | nuke this stupid "Oh, there's a bug? Let me add a flag to let users |
| 910 | request that they not trigger this bug." flag |
| 911 | |
| 912 | * Flags & Config |
| 913 | * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` |
| 914 | * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore |
| 915 | a deprecated aliases for `--scope=all` |
| 916 | * Create config option (sparse.scope?), tie it to the "Cliff notes" |
| 917 | overview |
| 918 | |
| 919 | * Add --scope=sparse (and --scope=all) flag to each of the history querying |
| 920 | commands. IMPORTANT: make sure diff machinery changes don't mess with |
| 921 | format-patch, fast-export, etc. |
| 922 | |
| 923 | === Known bugs === |
| 924 | |
| 925 | This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've |
| 926 | been working on it. |
| 927 | |
| 928 | 0. Behavior A is not well supported in Git. (Behavior B didn't used to |
| 929 | be either, but was the easier of the two to implement.) |
| 930 | |
| 931 | 1. am and apply: |
| 932 | |
| 933 | apply, without `--index` or `--cached`, relies on files being present |
| 934 | in the working copy, and also writes to them unconditionally. As |
| 935 | such, it should first check for the files' presence, and if found to |
| 936 | be SKIP_WORKTREE, then clear the bit and vivify the paths, then do |
| 937 | its work. Currently, it just throws an error. |
| 938 | |
| 939 | apply, with either `--cached` or `--index`, will not preserve the |
| 940 | SKIP_WORKTREE bit. This is fine if the file has conflicts, but |
| 941 | otherwise SKIP_WORKTREE bits should be preserved for --cached and |
| 942 | probably also for --index. |
| 943 | |
| 944 | am, if there are no conflicts, will vivify files and fail to preserve |
| 945 | the SKIP_WORKTREE bit. If there are conflicts and `-3` is not |
| 946 | specified, it will vivify files and then complain the patch doesn't |
| 947 | apply. If there are conflicts and `-3` is specified, it will vivify |
| 948 | files and then complain that those vivified files would be |
| 949 | overwritten by merge. |
| 950 | |
| 951 | 2. reset --hard: |
| 952 | |
| 953 | reset --hard provides confusing error message (works correctly, but |
| 954 | misleads the user into believing it didn't): |
| 955 | |
| 956 | $ touch addme |
| 957 | $ git add addme |
| 958 | $ git ls-files -t |
| 959 | H addme |
| 960 | H tracked |
| 961 | S tracked-but-maybe-skipped |
| 962 | $ git reset --hard # usually works great |
| 963 | error: Path 'addme' not uptodate; will not remove from working tree. |
| 964 | HEAD is now at bdbbb6f third |
| 965 | $ git ls-files -t |
| 966 | H tracked |
| 967 | S tracked-but-maybe-skipped |
| 968 | $ ls -1 |
| 969 | tracked |
| 970 | |
| 971 | `git reset --hard` DID remove addme from the index and the working tree, contrary |
| 972 | to the error message, but in line with how reset --hard should behave. |
| 973 | |
| 974 | 3. read-tree |
| 975 | |
| 976 | `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the |
| 977 | entries it reads into the index, resulting in all your files suddenly |
| 978 | appearing to be "deleted". |
| 979 | |
| 980 | 4. Checkout, restore: |
| 981 | |
| 982 | These command do not handle path & revision arguments appropriately: |
| 983 | |
| 984 | $ ls |
| 985 | tracked |
| 986 | $ git ls-files -t |
| 987 | H tracked |
| 988 | S tracked-but-maybe-skipped |
| 989 | $ git status --porcelain |
| 990 | $ git checkout -- '*skipped' |
| 991 | error: pathspec '*skipped' did not match any file(s) known to git |
| 992 | $ git ls-files -- '*skipped' |
| 993 | tracked-but-maybe-skipped |
| 994 | $ git checkout HEAD -- '*skipped' |
| 995 | error: pathspec '*skipped' did not match any file(s) known to git |
| 996 | $ git ls-tree HEAD | grep skipped |
| 997 | 100644 blob 276f5a64354b791b13840f02047738c77ad0584f tracked-but-maybe-skipped |
| 998 | $ git status --porcelain |
| 999 | $ git checkout HEAD~1 -- '*skipped' |
| 1000 | $ git ls-files -t |
| 1001 | H tracked |
| 1002 | H tracked-but-maybe-skipped |
| 1003 | $ git status --porcelain |
| 1004 | M tracked-but-maybe-skipped |
| 1005 | $ git checkout HEAD -- '*skipped' |
| 1006 | $ git status --porcelain |
| 1007 | $ |
| 1008 | |
| 1009 | Note that checkout without a revision (or restore --staged) fails to |
| 1010 | find a file to restore from the index, even though ls-files shows |
| 1011 | such a file certainly exists. |
| 1012 | |
| 1013 | Similar issues occur with HEAD (--source=HEAD in restore's case), |
| 1014 | but suddenly works when HEAD~1 is specified. And then after that it |
| 1015 | will work with HEAD specified, even though it didn't before. |
| 1016 | |
| 1017 | Directories are also an issue: |
| 1018 | |
| 1019 | $ git sparse-checkout set nomatches |
| 1020 | $ git status |
| 1021 | On branch main |
| 1022 | You are in a sparse checkout with 0% of tracked files present. |
| 1023 | |
| 1024 | nothing to commit, working tree clean |
| 1025 | $ git checkout . |
| 1026 | error: pathspec '.' did not match any file(s) known to git |
| 1027 | $ git checkout HEAD~1 . |
| 1028 | Updated 1 path from 58916d9 |
| 1029 | $ git ls-files -t |
| 1030 | S tracked |
| 1031 | H tracked-but-maybe-skipped |
| 1032 | |
| 1033 | 5. checkout and restore --staged, continued: |
| 1034 | |
| 1035 | These commands do not correctly scope operations to the sparse |
| 1036 | specification, and make it worse by not setting important SKIP_WORKTREE |
| 1037 | bits: |
| 1038 | |
| 1039 | $ git restore --source OLDREV --staged outside-sparse-cone/ |
| 1040 | $ git status --porcelain |
| 1041 | MD outside-sparse-cone/file1 |
| 1042 | MD outside-sparse-cone/file2 |
| 1043 | MD outside-sparse-cone/file3 |
| 1044 | |
| 1045 | We can add a --scope=all mode to `git restore` to let it operate outside |
| 1046 | the sparse specification, but then it will be important to set the |
| 1047 | SKIP_WORKTREE bits appropriately. |
| 1048 | |
| 1049 | 6. Performance issues; see: |
| 1050 | https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ |
| 1051 | |
| 1052 | |
| 1053 | === Reference Emails === |
| 1054 | |
| 1055 | Emails that detail various bugs we've had in sparse-checkout: |
| 1056 | |
| 1057 | [1] (Original descriptions of behavior A & behavior B) |
| 1058 | https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ |
| 1059 | [2] (Fix stash applications in sparse checkouts; bugs from behavioral differences) |
| 1060 | https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ |
| 1061 | [3] (Present-despite-skipped entries) |
| 1062 | https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ |
| 1063 | [4] (Clone --no-checkout interaction) |
| 1064 | https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) |
| 1065 | [5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`) |
| 1066 | https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ |
| 1067 | [6] (SKIP_WORKTREE is advisory, not mandatory) |
| 1068 | https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ |
| 1069 | [7] (`worktree add` should copy sparsity settings from current worktree) |
| 1070 | https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ |
| 1071 | [8] (Avoid negative surprises in add, rm, and mv) |
| 1072 | https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ |
| 1073 | https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ |
| 1074 | [9] (Move from out-of-cone to in-cone) |
| 1075 | https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ |
| 1076 | https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ |
| 1077 | [10] (Unnecessarily downloading objects outside sparse specification) |
| 1078 | https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ |
| 1079 | |
| 1080 | [11] (Stolee's comments on high-level usecases) |
| 1081 | https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ |
| 1082 | |
| 1083 | [12] Others commenting on eventually switching default to behavior A: |
| 1084 | * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/ |
| 1085 | * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ |
| 1086 | * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/ |
| 1087 | |
| 1088 | [13] Previous config name suggestion and description |
| 1089 | * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ |
| 1090 | |
| 1091 | [14] Tangential issue: switch to cone mode as default sparse specification mechanism: |
| 1092 | https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ |
| 1093 | |
| 1094 | [15] Lengthy email on grep behavior, covering what should be searched: |
| 1095 | * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ |
| 1096 | |
| 1097 | [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations, |
| 1098 | search for the parenthetical comment starting "We do not check". |
| 1099 | https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ |
| 1100 | |
| 1101 | [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ |