Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 1 | Tweaking diff output |
| 2 | ==================== |
| 3 | June 2005 |
| 4 | |
| 5 | |
| 6 | Introduction |
| 7 | ------------ |
| 8 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 9 | The diff commands git-diff-index, git-diff-files, git-diff-tree, and |
| 10 | git-diff-stages can be told to manipulate differences they find in |
| 11 | unconventional ways before showing diff(1) output. The manipulation |
| 12 | is collectively called "diffcore transformation". This short note |
| 13 | describes what they are and how to use them to produce diff outputs |
| 14 | that are easier to understand than the conventional kind. |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 15 | |
| 16 | |
| 17 | The chain of operation |
| 18 | ---------------------- |
| 19 | |
| 20 | The git-diff-* family works by first comparing two sets of |
| 21 | files: |
| 22 | |
Junio C Hamano | 215a7ad | 2005-09-07 17:26:23 -0700 | [diff] [blame] | 23 | - git-diff-index compares contents of a "tree" object and the |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 24 | working directory (when '\--cached' flag is not used) or a |
| 25 | "tree" object and the index file (when '\--cached' flag is |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 26 | used); |
| 27 | |
| 28 | - git-diff-files compares contents of the index file and the |
| 29 | working directory; |
| 30 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 31 | - git-diff-tree compares contents of two "tree" objects; |
| 32 | |
| 33 | - git-diff-stages compares contents of blobs at two stages in an |
| 34 | unmerged index file. |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 35 | |
| 36 | In all of these cases, the commands themselves compare |
| 37 | corresponding paths in the two sets of files. The result of |
| 38 | comparison is passed from these commands to what is internally |
| 39 | called "diffcore", in a format similar to what is output when |
| 40 | the -p option is not used. E.g. |
| 41 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 42 | ------------------------------------------------ |
| 43 | in-place edit :100644 100644 bcd1234... 0123456... M file0 |
| 44 | create :000000 100644 0000000... 1234567... A file4 |
| 45 | delete :100644 000000 1234567... 0000000... D file5 |
| 46 | unmerged :000000 000000 0000000... 0000000... U file6 |
| 47 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 48 | |
| 49 | The diffcore mechanism is fed a list of such comparison results |
| 50 | (each of which is called "filepair", although at this point each |
| 51 | of them talks about a single file), and transforms such a list |
Junio C Hamano | 28f8faf | 2005-06-05 17:54:10 -0700 | [diff] [blame] | 52 | into another list. There are currently 6 such transformations: |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 53 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 54 | - diffcore-pathspec |
| 55 | - diffcore-break |
| 56 | - diffcore-rename |
| 57 | - diffcore-merge-broken |
| 58 | - diffcore-pickaxe |
| 59 | - diffcore-order |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 60 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 61 | These are applied in sequence. The set of filepairs git-diff-\* |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 62 | commands find are used as the input to diffcore-pathspec, and |
| 63 | the output from diffcore-pathspec is used as the input to the |
| 64 | next transformation. The final result is then passed to the |
| 65 | output routine and generates either diff-raw format (see Output |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 66 | format sections of the manual for git-diff-\* commands) or |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 67 | diff-patch format. |
| 68 | |
| 69 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 70 | diffcore-pathspec: For Ignoring Files Outside Our Consideration |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 71 | --------------------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 72 | |
| 73 | The first transformation in the chain is diffcore-pathspec, and |
| 74 | is controlled by giving the pathname parameters to the |
| 75 | git-diff-* commands on the command line. The pathspec is used |
| 76 | to limit the world diff operates in. It removes the filepairs |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 77 | outside the specified set of pathnames. E.g. If the input set |
| 78 | of filepairs included: |
| 79 | |
| 80 | ------------------------------------------------ |
| 81 | :100644 100644 bcd1234... 0123456... M junkfile |
| 82 | ------------------------------------------------ |
| 83 | |
| 84 | but the command invocation was "git-diff-files myfile", then the |
| 85 | junkfile entry would be removed from the list because only "myfile" |
| 86 | is under consideration. |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 87 | |
| 88 | Implementation note. For performance reasons, git-diff-tree |
| 89 | uses the pathname parameters on the command line to cull set of |
| 90 | filepairs it feeds the diffcore mechanism itself, and does not |
| 91 | use diffcore-pathspec, but the end result is the same. |
| 92 | |
| 93 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 94 | diffcore-break: For Splitting Up "Complete Rewrites" |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 95 | ---------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 96 | |
| 97 | The second transformation in the chain is diffcore-break, and is |
| 98 | controlled by the -B option to the git-diff-* commands. This is |
| 99 | used to detect a filepair that represents "complete rewrite" and |
| 100 | break such filepair into two filepairs that represent delete and |
| 101 | create. E.g. If the input contained this filepair: |
| 102 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 103 | ------------------------------------------------ |
| 104 | :100644 100644 bcd1234... 0123456... M file0 |
| 105 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 106 | |
| 107 | and if it detects that the file "file0" is completely rewritten, |
| 108 | it changes it to: |
| 109 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 110 | ------------------------------------------------ |
| 111 | :100644 000000 bcd1234... 0000000... D file0 |
| 112 | :000000 100644 0000000... 0123456... A file0 |
| 113 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 114 | |
| 115 | For the purpose of breaking a filepair, diffcore-break examines |
| 116 | the extent of changes between the contents of the files before |
| 117 | and after modification (i.e. the contents that have "bcd1234..." |
| 118 | and "0123456..." as their SHA1 content ID, in the above |
| 119 | example). The amount of deletion of original contents and |
| 120 | insertion of new material are added together, and if it exceeds |
| 121 | the "break score", the filepair is broken into two. The break |
| 122 | score defaults to 50% of the size of the smaller of the original |
| 123 | and the result (i.e. if the edit shrinks the file, the size of |
| 124 | the result is used; if the edit lengthens the file, the size of |
| 125 | the original is used), and can be customized by giving a number |
| 126 | after "-B" option (e.g. "-B75" to tell it to use 75%). |
| 127 | |
| 128 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 129 | diffcore-rename: For Detection Renames and Copies |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 130 | ------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 131 | |
| 132 | This transformation is used to detect renames and copies, and is |
| 133 | controlled by the -M option (to detect renames) and the -C option |
| 134 | (to detect copies as well) to the git-diff-* commands. If the |
| 135 | input contained these filepairs: |
| 136 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 137 | ------------------------------------------------ |
| 138 | :100644 000000 0123456... 0000000... D fileX |
| 139 | :000000 100644 0000000... 0123456... A file0 |
| 140 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 141 | |
| 142 | and the contents of the deleted file fileX is similar enough to |
| 143 | the contents of the created file file0, then rename detection |
| 144 | merges these filepairs and creates: |
| 145 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 146 | ------------------------------------------------ |
| 147 | :100644 100644 0123456... 0123456... R100 fileX file0 |
| 148 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 149 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 150 | When the "-C" option is used, the original contents of modified files, |
| 151 | and deleted files (and also unmodified files, if the |
| 152 | "\--find-copies-harder" option is used) are considered as candidates |
| 153 | of the source files in rename/copy operation. If the input were like |
| 154 | these filepairs, that talk about a modified file fileY and a newly |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 155 | created file file0: |
| 156 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 157 | ------------------------------------------------ |
| 158 | :100644 100644 0123456... 1234567... M fileY |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 159 | :000000 100644 0000000... bcd3456... A file0 |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 160 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 161 | |
| 162 | the original contents of fileY and the resulting contents of |
| 163 | file0 are compared, and if they are similar enough, they are |
| 164 | changed to: |
| 165 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 166 | ------------------------------------------------ |
| 167 | :100644 100644 0123456... 1234567... M fileY |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 168 | :100644 100644 0123456... bcd3456... C100 fileY file0 |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 169 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 170 | |
| 171 | In both rename and copy detection, the same "extent of changes" |
| 172 | algorithm used in diffcore-break is used to determine if two |
| 173 | files are "similar enough", and can be customized to use |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 174 | a similarity score different from the default of 50% by giving a |
| 175 | number after the "-M" or "-C" option (e.g. "-M8" to tell it to use |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 176 | 8/10 = 80%). |
| 177 | |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 178 | Note. When the "-C" option is used with `\--find-copies-harder` |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 179 | option, git-diff-\* commands feed unmodified filepairs to |
Junio C Hamano | 232b75a | 2005-06-19 13:14:53 -0700 | [diff] [blame] | 180 | diffcore mechanism as well as modified ones. This lets the copy |
| 181 | detector consider unmodified files as copy source candidates at |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 182 | the expense of making it slower. Without `\--find-copies-harder`, |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 183 | git-diff-\* commands can detect copies only if the file that was |
Junio C Hamano | 232b75a | 2005-06-19 13:14:53 -0700 | [diff] [blame] | 184 | copied happened to have been modified in the same changeset. |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 185 | |
| 186 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 187 | diffcore-merge-broken: For Putting "Complete Rewrites" Back Together |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 188 | -------------------------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 189 | |
| 190 | This transformation is used to merge filepairs broken by |
Christian Meder | f73ae1f | 2005-10-05 15:08:26 -0700 | [diff] [blame] | 191 | diffcore-break, and not transformed into rename/copy by |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 192 | diffcore-rename, back into a single modification. This always |
| 193 | runs when diffcore-break is used. |
| 194 | |
| 195 | For the purpose of merging broken filepairs back, it uses a |
| 196 | different "extent of changes" computation from the ones used by |
| 197 | diffcore-break and diffcore-rename. It counts only the deletion |
| 198 | from the original, and does not count insertion. If you removed |
| 199 | only 10 lines from a 100-line document, even if you added 910 |
| 200 | new lines to make a new 1000-line document, you did not do a |
| 201 | complete rewrite. diffcore-break breaks such a case in order to |
| 202 | help diffcore-rename to consider such filepairs as candidate of |
| 203 | rename/copy detection, but if filepairs broken that way were not |
| 204 | matched with other filepairs to create rename/copy, then this |
| 205 | transformation merges them back into the original |
| 206 | "modification". |
| 207 | |
| 208 | The "extent of changes" parameter can be tweaked from the |
| 209 | default 80% (that is, unless more than 80% of the original |
| 210 | material is deleted, the broken pairs are merged back into a |
| 211 | single modification) by giving a second number to -B option, |
| 212 | like these: |
| 213 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 214 | * -B50/60 (give 50% "break score" to diffcore-break, use 60% |
| 215 | for diffcore-merge-broken). |
| 216 | |
| 217 | * -B/60 (the same as above, since diffcore-break defaults to 50%). |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 218 | |
Junio C Hamano | 366175e | 2005-06-19 13:17:50 -0700 | [diff] [blame] | 219 | Note that earlier implementation left a broken pair as a separate |
Christian Meder | f73ae1f | 2005-10-05 15:08:26 -0700 | [diff] [blame] | 220 | creation and deletion patches. This was an unnecessary hack and |
Junio C Hamano | 366175e | 2005-06-19 13:17:50 -0700 | [diff] [blame] | 221 | the latest implementation always merges all the broken pairs |
| 222 | back into modifications, but the resulting patch output is |
Christian Meder | f73ae1f | 2005-10-05 15:08:26 -0700 | [diff] [blame] | 223 | formatted differently for easier review in case of such |
Junio C Hamano | 366175e | 2005-06-19 13:17:50 -0700 | [diff] [blame] | 224 | a complete rewrite by showing the entire contents of old version |
| 225 | prefixed with '-', followed by the entire contents of new |
| 226 | version prefixed with '+'. |
| 227 | |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 228 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 229 | diffcore-pickaxe: For Detecting Addition/Deletion of Specified String |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 230 | --------------------------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 231 | |
| 232 | This transformation is used to find filepairs that represent |
| 233 | changes that touch a specified string, and is controlled by the |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 234 | -S option and the `\--pickaxe-all` option to the git-diff-* |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 235 | commands. |
| 236 | |
| 237 | When diffcore-pickaxe is in use, it checks if there are |
| 238 | filepairs whose "original" side has the specified string and |
| 239 | whose "result" side does not. Such a filepair represents "the |
| 240 | string appeared in this changeset". It also checks for the |
| 241 | opposite case that loses the specified string. |
| 242 | |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 243 | When `\--pickaxe-all` is not in effect, diffcore-pickaxe leaves |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 244 | only such filepairs that touch the specified string in its |
Yasushi SHOJI | e1ccf53 | 2005-09-12 02:29:10 +0900 | [diff] [blame] | 245 | output. When `\--pickaxe-all` is used, diffcore-pickaxe leaves all |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 246 | filepairs intact if there is such a filepair, or makes the |
| 247 | output empty otherwise. The latter behaviour is designed to |
| 248 | make reviewing of the changes in the context of the whole |
| 249 | changeset easier. |
| 250 | |
| 251 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 252 | diffcore-order: For Sorting the Output Based on Filenames |
Junio C Hamano | a67c1d0 | 2005-10-29 00:50:42 -0700 | [diff] [blame^] | 253 | --------------------------------------------------------- |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 254 | |
| 255 | This is used to reorder the filepairs according to the user's |
| 256 | (or project's) taste, and is controlled by the -O option to the |
| 257 | git-diff-* commands. |
| 258 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 259 | This takes a text file each of whose lines is a shell glob |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 260 | pattern. Filepairs that match a glob pattern on an earlier line |
| 261 | in the file are output before ones that match a later line, and |
| 262 | filepairs that do not match any glob pattern are output last. |
| 263 | |
c.shoemaker@cox.net | 59df2a1 | 2005-10-29 00:15:49 -0400 | [diff] [blame] | 264 | As an example, a typical orderfile for the core git probably |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 265 | would look like this: |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 266 | |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 267 | ------------------------------------------------ |
Jonas Fonseca | df8baa4 | 2005-10-03 19:16:30 +0200 | [diff] [blame] | 268 | README |
| 269 | Makefile |
| 270 | Documentation |
| 271 | *.h |
| 272 | *.c |
| 273 | t |
Junio C Hamano | 8db9307 | 2005-08-30 13:51:01 -0700 | [diff] [blame] | 274 | ------------------------------------------------ |
Junio C Hamano | 4a1332d | 2005-06-05 14:30:58 -0700 | [diff] [blame] | 275 | |