| Rebases and cherry-picks involve a sequence of merges whose results are |
| recorded as new single-parent commits. The first parent side of those |
| merges represent the "upstream" side, and often include a far larger set of |
| changes than the second parent side. Traditionally, the renames on the |
| first-parent side of that sequence of merges were repeatedly re-detected |
| for every merge. This file explains why it is safe and effective during |
| rebases and cherry-picks to remember renames on the upstream side of |
| history as an optimization, assuming all merges are automatic and clean |
| (i.e. no conflicts and not interrupted for user input or editing). |
| |
| Outline: |
| |
| 0. Assumptions |
| |
| 1. How rebasing and cherry-picking work |
| |
| 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a |
| superset of the renames on MERGE_SIDE1 for the next pick. |
| |
| 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also |
| a rename on MERGE_SIDE1 for the next pick |
| |
| 4. A detailed description of the counter-examples to #3. |
| |
| 5. Why the special cases in #4 are still fully reasonable to use to pair |
| up files for three-way content merging in the merge machinery, and why |
| they do not affect the correctness of the merge. |
| |
| 6. Interaction with skipping of "irrelevant" renames |
| |
| 7. Additional items that need to be cached |
| |
| 8. How directory rename detection interacts with the above and why this |
| optimization is still safe even if merge.directoryRenames is set to |
| "true". |
| |
| |
| === 0. Assumptions === |
| |
| There are two assumptions that will hold throughout this document: |
| |
| * The upstream side where commits are transplanted to is treated as the |
| first parent side when rebase/cherry-pick call the merge machinery |
| |
| * All merges are fully automatic |
| |
| and a third that will hold in sections 2-5 for simplicity, that I'll later |
| address in section 8: |
| |
| * No directory renames occur |
| |
| |
| Let me explain more about each assumption and why I include it: |
| |
| |
| The first assumption is merely for the purposes of making this document |
| clearer; the optimization implementation does not actually depend upon it. |
| However, the assumption does hold in all cases because it reflects the way |
| that both rebase and cherry-pick were implemented; and the implementation |
| of cherry-pick and rebase are not readily changeable for backwards |
| compatibility reasons (see for example the discussion of the --ours and |
| --theirs flag in the documentation of `git checkout`, particularly the |
| comments about how they behave with rebase). The optimization avoids |
| checking first-parent-ness, though. It checks the conditions that make the |
| optimization valid instead, so it would still continue working if someone |
| changed the parent ordering that cherry-pick and rebase use. But making |
| this assumption does make this document much clearer and prevents me from |
| having to repeat every example twice. |
| |
| If the second assumption is violated, then the optimization simply is |
| turned off and thus isn't relevant to consider. The second assumption can |
| also be stated as "there is no interruption for a user to resolve conflicts |
| or to just further edit or tweak files". While real rebases and |
| cherry-picks are often interrupted (either because it's an interactive |
| rebase where the user requested to stop and edit, or because there were |
| conflicts that the user needs to resolve), the cache of renames is not |
| stored on disk, and thus is thrown away as soon as the rebase or cherry |
| pick stops for the user to resolve the operation. |
| |
| The third assumption makes sections 2-5 simpler, and allows people to |
| understand the basics of why this optimization is safe and effective, and |
| then I can go back and address the specifics in section 8. It is probably |
| also worth noting that if directory renames do occur, then the default of |
| merge.directoryRenames being set to "conflict" means that the operation |
| will stop for users to resolve the conflicts and the cache will be thrown |
| away, and thus that there won't be an optimization to apply. So, the only |
| reason we need to address directory renames specifically, is that some |
| users will have set merge.directoryRenames to "true" to allow the merges to |
| continue to proceed automatically. The optimization is still safe with |
| this config setting, but we have to discuss a few more cases to show why; |
| this discussion is deferred until section 8. |
| |
| |
| === 1. How rebasing and cherry-picking work === |
| |
| Consider the following setup (from the git-rebase manpage): |
| |
| A---B---C topic |
| / |
| D---E---F---G main |
| |
| After rebasing or cherry-picking topic onto main, this will appear as: |
| |
| A'--B'--C' topic |
| / |
| D---E---F---G main |
| |
| The way the commits A', B', and C' are created is through a series of |
| merges, where rebase or cherry-pick sequentially uses each of the three |
| A-B-C commits in a special merge operation. Let's label the three commits |
| in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For |
| this picture, the three commits for each of the three merges would be: |
| |
| To create A': |
| MERGE_BASE: E |
| MERGE_SIDE1: G |
| MERGE_SIDE2: A |
| |
| To create B': |
| MERGE_BASE: A |
| MERGE_SIDE1: A' |
| MERGE_SIDE2: B |
| |
| To create C': |
| MERGE_BASE: B |
| MERGE_SIDE1: B' |
| MERGE_SIDE2: C |
| |
| Sometimes, folks are surprised that these three-way merges are done. It |
| can be useful in understanding these three-way merges to view them in a |
| slightly different light. For example, in creating C', you can view it as |
| either: |
| |
| * Apply the changes between B & C to B' |
| * Apply the changes between B & B' to C |
| |
| Conceptually the two statements above are the same as a three-way merge of |
| B, B', and C, at least the parts before you decide to record a commit. |
| |
| |
| === 2. Why the renames on MERGE_SIDE1 in any given pick are always a === |
| === superset of the renames on MERGE_SIDE1 for the next pick. === |
| |
| The merge machinery uses the filenames it is fed from MERGE_BASE, |
| MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different |
| filename under one of three conditions: |
| |
| * To make both pieces of a conflict available to a user during conflict |
| resolution (examples: directory/file conflict, add/add type conflict |
| such as symlink vs. regular file) |
| |
| * When MERGE_SIDE1 renames the file. |
| |
| * When MERGE_SIDE2 renames the file. |
| |
| First, let's remember what commits are involved in the first and second |
| picks of the cherry-pick or rebase sequence: |
| |
| To create A': |
| MERGE_BASE: E |
| MERGE_SIDE1: G |
| MERGE_SIDE2: A |
| |
| To create B': |
| MERGE_BASE: A |
| MERGE_SIDE1: A' |
| MERGE_SIDE2: B |
| |
| So, in particular, we need to show that the renames between E and G are a |
| superset of those between A and A'. |
| |
| A' is created by the first merge. A' will only have renames for one of the |
| three reasons listed above. The first case, a conflict, results in a |
| situation where the cache is dropped and thus this optimization doesn't |
| take effect, so we need not consider that case. The third case, a rename |
| on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up |
| in A -- therefore when diffing A and A' that path does not show up as a |
| rename. The only remaining way for renames to show up in A' is for the |
| rename to come from MERGE_SIDE1. Therefore, all renames between A and A' |
| are a subset of those between E and G. Equivalently, all renames between E |
| and G are a superset of those between A and A'. |
| |
| |
| === 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === |
| === always also a rename on MERGE_SIDE1 for the next pick. === |
| |
| Let's again look at the first two picks: |
| |
| To create A': |
| MERGE_BASE: E |
| MERGE_SIDE1: G |
| MERGE_SIDE2: A |
| |
| To create B': |
| MERGE_BASE: A |
| MERGE_SIDE1: A' |
| MERGE_SIDE2: B |
| |
| Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. |
| any given rename from E to G. Let's use the filenames 'oldfile' and |
| 'newfile' for demonstration purposes. That first pick will function as |
| follows; when the rename is detected, the merge machinery will do a |
| three-way content merge of the following: |
| E:oldfile |
| G:newfile |
| A:oldfile |
| and produce a new result: |
| A':newfile |
| |
| Note above that I've assumed that E->A did not rename oldfile. If that |
| side did rename, then we most likely have a rename/rename(1to2) conflict |
| that will cause the rebase or cherry-pick operation to halt and drop the |
| in-memory cache of renames and thus doesn't need to be considered further. |
| In the special case that E->A does rename the file but also renames it to |
| newfile, then there is no conflict from the renaming and the merge can |
| succeed. In this special case, the rename is not valid to cache because |
| the second merge will find A:newfile in the MERGE_BASE (see also the new |
| testcases in t6429 with "rename same file identically" in their |
| description). So a rename/rename(1to1) needs to be specially handled by |
| pruning renames from the cache and decrementing the dir_rename_counts in |
| the current and leading directories associated with those renames. Or, |
| since these are really rare, one could just take the easy way out and |
| disable the remembering renames optimization when a rename/rename(1to1) |
| happens. |
| |
| The previous paragraph handled the cases for E->A renaming oldfile, let's |
| continue assuming that oldfile is not renamed in A. |
| |
| As per the diagram for creating B', MERGE_SIDE1 involves the changes from A |
| to A'. So, we are curious whether A:oldfile and A':newfile will be viewed |
| as renames. Note that: |
| |
| * There will be no A':oldfile (because there could not have been a |
| G:oldfile as we do not do break detection in the merge machinery and |
| G:newfile was detected as a rename, and by the construction of the |
| rename above that merged cleanly, the merge machinery will ensure there |
| is no 'oldfile' in the result). |
| |
| * There will be no A:newfile (if there had been, we would have had a |
| rename/add conflict). |
| |
| * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a |
| clean three-way content merge involving A:oldfile). |
| |
| We can also expound on the third point above, by noting that three-way |
| content merges can also be viewed as applying the differences between the |
| base and one side to the other side. Thus we can view A':newfile as |
| having been created by taking the changes between E:oldfile and G:newfile |
| (which were detected as being related, i.e. <50% changed) to A:oldfile. |
| |
| Thus A:oldfile and A':newfile are just as related as E:oldfile and |
| G:newfile are -- they have exactly identical differences. Since the latter |
| were detected as renames, A:oldfile and A':newfile should also be |
| detectable as renames almost always. |
| |
| |
| === 4. A detailed description of the counter-examples to #3. === |
| |
| We already noted in section 3 that rename/rename(1to1) (i.e. both sides |
| renaming a file the same way) was one counter-example. The more |
| interesting bit, though, is why did we need to use the "almost" qualifier |
| when stating that A:oldfile and A':newfile are "almost" always detectable |
| as renames? |
| |
| Let's repeat an earlier point that section 3 made: |
| |
| A':newfile was created by applying the changes between E:oldfile and |
| G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were |
| <50% of the size of E:oldfile. |
| |
| If those changes that were <50% of the size of E:oldfile are also <50% of |
| the size of A:oldfile, then A:oldfile and A':newfile will be detectable as |
| renames. However, if there is a dramatic size reduction between E:oldfile |
| and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile |
| still somehow merge cleanly), then traditional rename detection would not |
| detect A:oldfile and A':newfile as renames. |
| |
| Here's an example where that can happen: |
| * E:oldfile had 20 lines |
| * G:newfile added 10 new lines at the beginning of the file |
| * A:oldfile kept the first 3 lines of the file, and deleted all the rest |
| then |
| => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. |
| E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and |
| A':newfile would not be. |
| |
| |
| === 5. Why the special cases in #4 are still fully reasonable to use to === |
| === pair up files for three-way content merging in the merge machinery, === |
| === and why they do not affect the correctness of the merge. === |
| |
| In the rename/rename(1to1) case, A:newfile and A':newfile are not renames |
| since they use the *same* filename. However, files with the same filename |
| are obviously fine to pair up for three-way content merging (the merge |
| machinery has never employed break detection). The interesting |
| counter-example case is thus not the rename/rename(1to1) case, but the case |
| where A did not rename oldfile. That was the case that we spent most of |
| the time discussing in sections 3 and 4. The remainder of this section |
| will be devoted to that case as well. |
| |
| So, even if A:oldfile and A':newfile aren't detectable as renames, why is |
| it still reasonable to pair them up for three-way content merging in the |
| merge machinery? There are multiple reasons: |
| |
| * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile |
| is *exactly* the same as the diff between E:oldfile and G:newfile. The |
| latter pair were detected as renames, so it seems unlikely to surprise |
| users for us to treat A:oldfile and A':newfile as renames. |
| |
| * In fact, "oldfile" and "newfile" were at one point detected as renames |
| due to how they were constructed in the E..G chain. And we used that |
| information once already in this rebase/cherry-pick. I think users |
| would be unlikely to be surprised at us continuing to treat the files |
| as renames and would quickly understand why we had done so. |
| |
| * Marking or declaring files as renames is *not* the end goal for merges. |
| Merges use renames to determine which files make sense to be paired up |
| for three-way content merges. |
| |
| * A:oldfile and A':newfile were _already_ paired up in a three-way |
| content merge; that is how A':newfile was created. In fact, that |
| three-way content merge was clean. So using them again in a later |
| three-way content merge seems very reasonable. |
| |
| However, the above is focusing on the common scenarios. Let's try to look |
| at all possible unusual scenarios and compare without the optimization to |
| with the optimization. Consider the following theoretical cases; we will |
| then dive into each to determine which of them are possible, |
| and if so, what they mean: |
| |
| 1. Without the optimization, the second merge results in a conflict. |
| With the optimization, the second merge also results in a conflict. |
| Questions: Are the conflicts confusingly different? Better in one case? |
| |
| 2. Without the optimization, the second merge results in NO conflict. |
| With the optimization, the second merge also results in NO conflict. |
| Questions: Are the merges the same? |
| |
| 3. Without the optimization, the second merge results in a conflict. |
| With the optimization, the second merge results in NO conflict. |
| Questions: Possible? Bug, bugfix, or something else? |
| |
| 4. Without the optimization, the second merge results in NO conflict. |
| With the optimization, the second merge results in a conflict. |
| Questions: Possible? Bug, bugfix, or something else? |
| |
| I'll consider all four cases, but out of order. |
| |
| The fourth case is impossible. For the code without the remembering |
| renames optimization to not get a conflict, B:oldfile would need to exactly |
| match A:oldfile -- if it doesn't, there would be a modify/delete conflict. |
| If A:oldfile matches B:oldfile exactly, then a three-way content merge |
| between A:oldfile, A':newfile, and B:oldfile would have no conflict and |
| just give us the version of newfile from A' as the result. |
| |
| From the same logic as the above paragraph, the second case would indeed |
| result in identical merges. When A:oldfile exactly matches B:oldfile, an |
| undetected rename would say, "Oh, I see one side didn't modify 'oldfile' |
| and the other side deleted it. I'll delete it. And I see you have this |
| brand new file named 'newfile' in A', so I'll keep it." That gives the |
| same results as three-way content merging A:oldfile, A':newfile, and |
| B:oldfile -- a removal of oldfile with the version of newfile from A' |
| showing up in the result. |
| |
| The third case is interesting. It means that A:oldfile and A':newfile were |
| not just similar enough, but that the changes between them did not conflict |
| with the changes between A:oldfile and B:oldfile. This would validate our |
| hunch that the files were similar enough to be used in a three-way content |
| merge, and thus seems entirely correct for us to have used them that way. |
| (Sidenote: One particular example here may be enlightening. Let's say that |
| B was an immediate revert of A. B clearly would have been a clean revert |
| of A, since A was B's immediate parent. One would assume that if you can |
| pick a commit, you should also be able to cherry-pick its immediate revert. |
| However, this is one of those funny corner cases; without this |
| optimization, we just successfully picked a commit cleanly, but we are |
| unable to cherry-pick its immediate revert due to the size differences |
| between E:oldfile and A:oldfile.) |
| |
| That leaves only the first case to consider -- when we get conflicts both |
| with or without the optimization. Without the optimization, we'll have a |
| modify/delete conflict, where both A':newfile and B:oldfile are left in the |
| tree for the user to deal with and no hints about the potential similarity |
| between the two. With the optimization, we'll have a three-way content |
| merged A:oldfile, A':newfile, and B:oldfile with conflict markers |
| suggesting we thought the files were related but giving the user the chance |
| to resolve. As noted above, I don't think users will find us treating |
| 'oldfile' and 'newfile' as related as a surprise since they were between E |
| and G. In any event, though, this case shouldn't be concerning since we |
| hit a conflict in both cases, told the user what we know, and asked them to |
| resolve it. |
| |
| So, in summary, case 4 is impossible, case 2 yields the same behavior, and |
| cases 1 and 3 seem to provide as good or better behavior with the |
| optimization than without. |
| |
| |
| === 6. Interaction with skipping of "irrelevant" renames === |
| |
| Previous optimizations involved skipping rename detection for paths |
| considered to be "irrelevant". See for example the following commits: |
| |
| * 32a56dfb99 ("merge-ort: precompute subset of sources for which we |
| need rename detection", 2021-03-11) |
| * 2fd9eda462 ("merge-ort: precompute whether directory rename |
| detection is needed", 2021-03-11) |
| * 9bd342137e ("diffcore-rename: determine which relevant_sources are |
| no longer relevant", 2021-03-13) |
| |
| Relevance is always determined by what the _other_ side of history has |
| done, in terms of modifying a file that our side renamed, or adding a |
| file to a directory which our side renamed. This means that a path |
| that is "irrelevant" when picking the first commit of a series in a |
| rebase or cherry-pick, may suddenly become "relevant" when picking the |
| next commit. |
| |
| The upshot of this is that we can only cache rename detection results |
| for relevant paths, and need to re-check relevance in subsequent |
| commits. If those subsequent commits have additional paths that are |
| relevant for rename detection, then we will need to redo rename |
| detection -- though we can limit it to the paths for which we have not |
| already detected renames. |
| |
| |
| === 7. Additional items that need to be cached === |
| |
| It turns out we have to cache more than just renames; we also cache: |
| |
| A) non-renames (i.e. unpaired deletes) |
| B) counts of renames within directories |
| C) sources that were marked as RELEVANT_LOCATION, but which were |
| downgraded to RELEVANT_NO_MORE |
| D) the toplevel trees involved in the merge |
| |
| These are all stored in struct rename_info, and respectively appear in |
| * cached_pairs (along side actual renames, just with a value of NULL) |
| * dir_rename_counts |
| * cached_irrelevant |
| * merge_trees |
| |
| The reason for (A) comes from the irrelevant renames skipping |
| optimization discussed in section 6. The fact that irrelevant renames |
| are skipped means we only get a subset of the potential renames |
| detected and subsequent commits may need to run rename detection on |
| the upstream side on a subset of the remaining renames (to get the |
| renames that are relevant for that later commit). Since unpaired |
| deletes are involved in rename detection too, we don't want to |
| repeatedly check that those paths remain unpaired on the upstream side |
| with every commit we are transplanting. |
| |
| The reason for (B) is that diffcore_rename_extended() is what |
| generates the counts of renames by directory which is needed in |
| directory rename detection, and if we don't run |
| diffcore_rename_extended() again then we need to have the output from |
| it, including dir_rename_counts, from the previous run. |
| |
| The reason for (C) is that merge-ort's tree traversal will again think |
| those paths are relevant (marking them as RELEVANT_LOCATION), but the |
| fact that they were downgraded to RELEVANT_NO_MORE means that |
| dir_rename_counts already has the information we need for directory |
| rename detection. (A path which becomes RELEVANT_CONTENT in a |
| subsequent commit will be removed from cached_irrelevant.) |
| |
| The reason for (D) is that is how we determine whether the remember |
| renames optimization can be used. In particular, remembering that our |
| sequence of merges looks like: |
| |
| Merge 1: |
| MERGE_BASE: E |
| MERGE_SIDE1: G |
| MERGE_SIDE2: A |
| => Creates A' |
| |
| Merge 2: |
| MERGE_BASE: A |
| MERGE_SIDE1: A' |
| MERGE_SIDE2: B |
| => Creates B' |
| |
| It is the fact that the trees A and A' appear both in Merge 1 and in |
| Merge 2, with A as a parent of A' that allows this optimization. So |
| we store the trees to compare with what we are asked to merge next |
| time. |
| |
| |
| === 8. How directory rename detection interacts with the above and === |
| === why this optimization is still safe even if === |
| === merge.directoryRenames is set to "true". === |
| |
| As noted in the assumptions section: |
| |
| """ |
| ...if directory renames do occur, then the default of |
| merge.directoryRenames being set to "conflict" means that the operation |
| will stop for users to resolve the conflicts and the cache will be |
| thrown away, and thus that there won't be an optimization to apply. |
| So, the only reason we need to address directory renames specifically, |
| is that some users will have set merge.directoryRenames to "true" to |
| allow the merges to continue to proceed automatically. |
| """ |
| |
| Let's remember that we need to look at how any given pick affects the next |
| one. So let's again use the first two picks from the diagram in section |
| one: |
| |
| First pick does this three-way merge: |
| MERGE_BASE: E |
| MERGE_SIDE1: G |
| MERGE_SIDE2: A |
| => creates A' |
| |
| Second pick does this three-way merge: |
| MERGE_BASE: A |
| MERGE_SIDE1: A' |
| MERGE_SIDE2: B |
| => creates B' |
| |
| Now, directory rename detection exists so that if one side of history |
| renames a directory, and the other side adds a new file to the old |
| directory, then the merge (with merge.directoryRenames=true) can move the |
| file into the new directory. There are two qualitatively different ways to |
| add a new file to an old directory: create a new file, or rename a file |
| into that directory. Also, directory renames can be done on either side of |
| history, so there are four cases to consider: |
| |
| * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir |
| * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir |
| * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir |
| * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir |
| |
| One last note before we consider these four cases: There are some |
| important properties about how we implement this optimization with |
| respect to directory rename detection that we need to bear in mind |
| while considering all of these cases: |
| |
| * rename caching occurs *after* applying directory renames |
| |
| * a rename created by directory rename detection is recorded for the side |
| of history that did the directory rename. |
| |
| * dir_rename_counts, the nested map of |
| {oldname => {newname => count}}, |
| is cached between runs as well. This basically means that directory |
| rename detection is also cached, though only on the side of history |
| that we cache renames for (MERGE_SIDE1 as far as this document is |
| concerned; see the assumptions section). Two interesting sub-notes |
| about these counts: |
| |
| * If we need to perform rename-detection again on the given side (e.g. |
| some paths are relevant for rename detection that weren't before), |
| then we clear dir_rename_counts and recompute it, making use of |
| cached_pairs. The reason it is important to do this is optimizations |
| around RELEVANT_LOCATION exist to prevent us from computing |
| unnecessary renames for directory rename detection and from computing |
| dir_rename_counts for irrelevant directories; but those same renames |
| or directories may become necessary for subsequent merges. The |
| easiest way to "fix up" dir_rename_counts in such cases is to just |
| recompute it. |
| |
| * If we prune rename/rename(1to1) entries from the cache, then we also |
| need to update dir_rename_counts to decrement the counts for the |
| involved directory and any relevant parent directories (to undo what |
| update_dir_rename_counts() in diffcore-rename.c incremented when the |
| rename was initially found). If we instead just disable the |
| remembering renames optimization when the exceedingly rare |
| rename/rename(1to1) cases occur, then dir_rename_counts will get |
| re-computed the next time rename detection occurs, as noted above. |
| |
| * the side with multiple commits to pick, is the side of history that we |
| do NOT cache renames for. Thus, there are no additional commits to |
| change the number of renames in a directory, except for those done by |
| directory rename detection (which always pad the majority). |
| |
| * the "renames" we cache are modified slightly by any directory rename, |
| as noted below. |
| |
| Now, with those notes out of the way, let's go through the four cases |
| in order: |
| |
| Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir |
| |
| This case looks like this: |
| |
| MERGE_BASE: E, Has olddir/ |
| MERGE_SIDE1: G, Renames olddir/ -> newdir/ |
| MERGE_SIDE2: A, Adds olddir/newfile |
| => creates A', With newdir/newfile |
| |
| MERGE_BASE: A, Has olddir/newfile |
| MERGE_SIDE1: A', Has newdir/newfile |
| MERGE_SIDE2: B, Modifies olddir/newfile |
| => expected B', with threeway-merged newdir/newfile from above |
| |
| In this case, with the optimization, note that after the first commit: |
| * MERGE_SIDE1 remembers olddir/ -> newdir/ |
| * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile |
| Given the cached rename noted above, the second merge can proceed as |
| expected without needing to perform rename detection from A -> A'. |
| |
| Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir |
| |
| This case looks like this: |
| MERGE_BASE: E oldfile, olddir/ |
| MERGE_SIDE1: G oldfile, olddir/ -> newdir/ |
| MERGE_SIDE2: A oldfile -> olddir/newfile |
| => creates A', With newdir/newfile representing original oldfile |
| |
| MERGE_BASE: A olddir/newfile |
| MERGE_SIDE1: A' newdir/newfile |
| MERGE_SIDE2: B modify olddir/newfile |
| => expected B', with threeway-merged newdir/newfile from above |
| |
| In this case, with the optimization, note that after the first commit: |
| * MERGE_SIDE1 remembers olddir/ -> newdir/ |
| * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile |
| (NOT oldfile -> newdir/newfile; compare to case with |
| (p->status == 'R' && new_path) in possibly_cache_new_pair()) |
| |
| Given the cached rename noted above, the second merge can proceed as |
| expected without needing to perform rename detection from A -> A'. |
| |
| Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir |
| |
| This case looks like this: |
| |
| MERGE_BASE: E, Has olddir/ |
| MERGE_SIDE1: G, Adds olddir/newfile |
| MERGE_SIDE2: A, Renames olddir/ -> newdir/ |
| => creates A', With newdir/newfile |
| |
| MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile |
| MERGE_SIDE1: A', Has newdir/newfile |
| MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile |
| => expected B', with newdir/newfile from A' |
| |
| In this case, with the optimization, note that after the first commit there |
| were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. |
| But the second merge didn't need any renames so this is fine. |
| |
| Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir |
| |
| This case looks like this: |
| |
| MERGE_BASE: E, Has olddir/ |
| MERGE_SIDE1: G, Renames oldfile -> olddir/newfile |
| MERGE_SIDE2: A, Renames olddir/ -> newdir/ |
| => creates A', With newdir/newfile representing original oldfile |
| |
| MERGE_BASE: A, Has oldfile |
| MERGE_SIDE1: A', Has newdir/newfile |
| MERGE_SIDE2: B, Modifies oldfile |
| => expected B', with threeway-merged newdir/newfile from above |
| |
| In this case, with the optimization, note that after the first commit: |
| * MERGE_SIDE1 remembers oldfile -> newdir/newfile |
| (NOT oldfile -> olddir/newfile; compare to case of second |
| block under p->status == 'R' in possibly_cache_new_pair()) |
| * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered |
| |
| Given the cached rename noted above, the second merge can proceed as |
| expected without needing to perform rename detection from A -> A'. |
| |
| Finally, I'll just note here that interactions with the |
| skip-irrelevant-renames optimization means we sometimes don't detect |
| renames for any files within a directory that was renamed, in which |
| case we will not have been able to detect any rename for the directory |
| itself. In such a case, we do not know whether the directory was |
| renamed; we want to be careful to avoid caching some kind of "this |
| directory was not renamed" statement. If we did, then a subsequent |
| commit being rebased could add a file to the old directory, and the |
| user would expect it to end up in the correct directory -- something |
| our erroneous "this directory was not renamed" cache would preclude. |