blob: cb4e562004e58439a0055d9ed6a6bdab249dfcdc [file] [log] [blame]
Junio C Hamano4a1332d2005-06-05 14:30:58 -07001Tweaking diff output
2====================
3June 2005
4
5
6Introduction
7------------
8
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -04009The diff commands git-diff-index, git-diff-files, git-diff-tree, and
10git-diff-stages can be told to manipulate differences they find in
11unconventional ways before showing diff(1) output. The manipulation
12is collectively called "diffcore transformation". This short note
13describes what they are and how to use them to produce diff outputs
14that are easier to understand than the conventional kind.
Junio C Hamano4a1332d2005-06-05 14:30:58 -070015
16
17The chain of operation
18----------------------
19
20The git-diff-* family works by first comparing two sets of
21files:
22
Junio C Hamano215a7ad2005-09-07 17:26:23 -070023 - git-diff-index compares contents of a "tree" object and the
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +090024 working directory (when '\--cached' flag is not used) or a
25 "tree" object and the index file (when '\--cached' flag is
Junio C Hamano4a1332d2005-06-05 14:30:58 -070026 used);
27
28 - git-diff-files compares contents of the index file and the
29 working directory;
30
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -040031 - git-diff-tree compares contents of two "tree" objects;
32
33 - git-diff-stages compares contents of blobs at two stages in an
34 unmerged index file.
Junio C Hamano4a1332d2005-06-05 14:30:58 -070035
36In all of these cases, the commands themselves compare
37corresponding paths in the two sets of files. The result of
38comparison is passed from these commands to what is internally
39called "diffcore", in a format similar to what is output when
40the -p option is not used. E.g.
41
Junio C Hamano8db93072005-08-30 13:51:01 -070042------------------------------------------------
43in-place edit :100644 100644 bcd1234... 0123456... M file0
44create :000000 100644 0000000... 1234567... A file4
45delete :100644 000000 1234567... 0000000... D file5
46unmerged :000000 000000 0000000... 0000000... U file6
47------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -070048
49The diffcore mechanism is fed a list of such comparison results
50(each of which is called "filepair", although at this point each
51of them talks about a single file), and transforms such a list
Junio C Hamano28f8faf2005-06-05 17:54:10 -070052into another list. There are currently 6 such transformations:
Junio C Hamano4a1332d2005-06-05 14:30:58 -070053
Junio C Hamano8db93072005-08-30 13:51:01 -070054- diffcore-pathspec
55- diffcore-break
56- diffcore-rename
57- diffcore-merge-broken
58- diffcore-pickaxe
59- diffcore-order
Junio C Hamano4a1332d2005-06-05 14:30:58 -070060
Junio C Hamano8db93072005-08-30 13:51:01 -070061These are applied in sequence. The set of filepairs git-diff-\*
Junio C Hamano4a1332d2005-06-05 14:30:58 -070062commands find are used as the input to diffcore-pathspec, and
63the output from diffcore-pathspec is used as the input to the
64next transformation. The final result is then passed to the
65output routine and generates either diff-raw format (see Output
Junio C Hamano8db93072005-08-30 13:51:01 -070066format sections of the manual for git-diff-\* commands) or
Junio C Hamano4a1332d2005-06-05 14:30:58 -070067diff-patch format.
68
69
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -040070diffcore-pathspec: For Ignoring Files Outside Our Consideration
Junio C Hamanoa67c1d02005-10-29 00:50:42 -070071---------------------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -070072
73The first transformation in the chain is diffcore-pathspec, and
74is controlled by giving the pathname parameters to the
75git-diff-* commands on the command line. The pathspec is used
76to limit the world diff operates in. It removes the filepairs
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -040077outside the specified set of pathnames. E.g. If the input set
78of filepairs included:
79
80------------------------------------------------
81:100644 100644 bcd1234... 0123456... M junkfile
82------------------------------------------------
83
84but the command invocation was "git-diff-files myfile", then the
85junkfile entry would be removed from the list because only "myfile"
86is under consideration.
Junio C Hamano4a1332d2005-06-05 14:30:58 -070087
88Implementation note. For performance reasons, git-diff-tree
89uses the pathname parameters on the command line to cull set of
90filepairs it feeds the diffcore mechanism itself, and does not
91use diffcore-pathspec, but the end result is the same.
92
93
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -040094diffcore-break: For Splitting Up "Complete Rewrites"
Junio C Hamanoa67c1d02005-10-29 00:50:42 -070095----------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -070096
97The second transformation in the chain is diffcore-break, and is
98controlled by the -B option to the git-diff-* commands. This is
99used to detect a filepair that represents "complete rewrite" and
100break such filepair into two filepairs that represent delete and
101create. E.g. If the input contained this filepair:
102
Junio C Hamano8db93072005-08-30 13:51:01 -0700103------------------------------------------------
104:100644 100644 bcd1234... 0123456... M file0
105------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700106
107and if it detects that the file "file0" is completely rewritten,
108it changes it to:
109
Junio C Hamano8db93072005-08-30 13:51:01 -0700110------------------------------------------------
111:100644 000000 bcd1234... 0000000... D file0
112:000000 100644 0000000... 0123456... A file0
113------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700114
115For the purpose of breaking a filepair, diffcore-break examines
116the extent of changes between the contents of the files before
117and after modification (i.e. the contents that have "bcd1234..."
118and "0123456..." as their SHA1 content ID, in the above
119example). The amount of deletion of original contents and
120insertion of new material are added together, and if it exceeds
121the "break score", the filepair is broken into two. The break
122score defaults to 50% of the size of the smaller of the original
123and the result (i.e. if the edit shrinks the file, the size of
124the result is used; if the edit lengthens the file, the size of
125the original is used), and can be customized by giving a number
126after "-B" option (e.g. "-B75" to tell it to use 75%).
127
128
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400129diffcore-rename: For Detection Renames and Copies
Junio C Hamanoa67c1d02005-10-29 00:50:42 -0700130-------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700131
132This transformation is used to detect renames and copies, and is
133controlled by the -M option (to detect renames) and the -C option
134(to detect copies as well) to the git-diff-* commands. If the
135input contained these filepairs:
136
Junio C Hamano8db93072005-08-30 13:51:01 -0700137------------------------------------------------
138:100644 000000 0123456... 0000000... D fileX
139:000000 100644 0000000... 0123456... A file0
140------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700141
142and the contents of the deleted file fileX is similar enough to
143the contents of the created file file0, then rename detection
144merges these filepairs and creates:
145
Junio C Hamano8db93072005-08-30 13:51:01 -0700146------------------------------------------------
147:100644 100644 0123456... 0123456... R100 fileX file0
148------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700149
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400150When the "-C" option is used, the original contents of modified files,
151and deleted files (and also unmodified files, if the
152"\--find-copies-harder" option is used) are considered as candidates
153of the source files in rename/copy operation. If the input were like
154these filepairs, that talk about a modified file fileY and a newly
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700155created file file0:
156
Junio C Hamano8db93072005-08-30 13:51:01 -0700157------------------------------------------------
158:100644 100644 0123456... 1234567... M fileY
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400159:000000 100644 0000000... bcd3456... A file0
Junio C Hamano8db93072005-08-30 13:51:01 -0700160------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700161
162the original contents of fileY and the resulting contents of
163file0 are compared, and if they are similar enough, they are
164changed to:
165
Junio C Hamano8db93072005-08-30 13:51:01 -0700166------------------------------------------------
167:100644 100644 0123456... 1234567... M fileY
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400168:100644 100644 0123456... bcd3456... C100 fileY file0
Junio C Hamano8db93072005-08-30 13:51:01 -0700169------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700170
171In both rename and copy detection, the same "extent of changes"
172algorithm used in diffcore-break is used to determine if two
173files are "similar enough", and can be customized to use
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400174a similarity score different from the default of 50% by giving a
175number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
Junio C Hamano4a1332d2005-06-05 14:30:58 -07001768/10 = 80%).
177
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +0900178Note. When the "-C" option is used with `\--find-copies-harder`
Junio C Hamano8db93072005-08-30 13:51:01 -0700179option, git-diff-\* commands feed unmodified filepairs to
Junio C Hamano232b75a2005-06-19 13:14:53 -0700180diffcore mechanism as well as modified ones. This lets the copy
181detector consider unmodified files as copy source candidates at
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +0900182the expense of making it slower. Without `\--find-copies-harder`,
Junio C Hamano8db93072005-08-30 13:51:01 -0700183git-diff-\* commands can detect copies only if the file that was
Junio C Hamano232b75a2005-06-19 13:14:53 -0700184copied happened to have been modified in the same changeset.
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700185
186
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400187diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
Junio C Hamanoa67c1d02005-10-29 00:50:42 -0700188--------------------------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700189
190This transformation is used to merge filepairs broken by
Christian Mederf73ae1f2005-10-05 15:08:26 -0700191diffcore-break, and not transformed into rename/copy by
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700192diffcore-rename, back into a single modification. This always
193runs when diffcore-break is used.
194
195For the purpose of merging broken filepairs back, it uses a
196different "extent of changes" computation from the ones used by
197diffcore-break and diffcore-rename. It counts only the deletion
198from the original, and does not count insertion. If you removed
199only 10 lines from a 100-line document, even if you added 910
200new lines to make a new 1000-line document, you did not do a
201complete rewrite. diffcore-break breaks such a case in order to
202help diffcore-rename to consider such filepairs as candidate of
203rename/copy detection, but if filepairs broken that way were not
204matched with other filepairs to create rename/copy, then this
205transformation merges them back into the original
206"modification".
207
208The "extent of changes" parameter can be tweaked from the
209default 80% (that is, unless more than 80% of the original
210material is deleted, the broken pairs are merged back into a
211single modification) by giving a second number to -B option,
212like these:
213
Junio C Hamano8db93072005-08-30 13:51:01 -0700214* -B50/60 (give 50% "break score" to diffcore-break, use 60%
215 for diffcore-merge-broken).
216
217* -B/60 (the same as above, since diffcore-break defaults to 50%).
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700218
Junio C Hamano366175e2005-06-19 13:17:50 -0700219Note that earlier implementation left a broken pair as a separate
Christian Mederf73ae1f2005-10-05 15:08:26 -0700220creation and deletion patches. This was an unnecessary hack and
Junio C Hamano366175e2005-06-19 13:17:50 -0700221the latest implementation always merges all the broken pairs
222back into modifications, but the resulting patch output is
Christian Mederf73ae1f2005-10-05 15:08:26 -0700223formatted differently for easier review in case of such
Junio C Hamano366175e2005-06-19 13:17:50 -0700224a complete rewrite by showing the entire contents of old version
225prefixed with '-', followed by the entire contents of new
226version prefixed with '+'.
227
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700228
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400229diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
Junio C Hamanoa67c1d02005-10-29 00:50:42 -0700230---------------------------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700231
232This transformation is used to find filepairs that represent
233changes that touch a specified string, and is controlled by the
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +0900234-S option and the `\--pickaxe-all` option to the git-diff-*
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700235commands.
236
237When diffcore-pickaxe is in use, it checks if there are
238filepairs whose "original" side has the specified string and
239whose "result" side does not. Such a filepair represents "the
240string appeared in this changeset". It also checks for the
241opposite case that loses the specified string.
242
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +0900243When `\--pickaxe-all` is not in effect, diffcore-pickaxe leaves
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400244only such filepairs that touch the specified string in its
Yasushi SHOJIe1ccf532005-09-12 02:29:10 +0900245output. When `\--pickaxe-all` is used, diffcore-pickaxe leaves all
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700246filepairs intact if there is such a filepair, or makes the
247output empty otherwise. The latter behaviour is designed to
248make reviewing of the changes in the context of the whole
249changeset easier.
250
251
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400252diffcore-order: For Sorting the Output Based on Filenames
Junio C Hamanoa67c1d02005-10-29 00:50:42 -0700253---------------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700254
255This is used to reorder the filepairs according to the user's
256(or project's) taste, and is controlled by the -O option to the
257git-diff-* commands.
258
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400259This takes a text file each of whose lines is a shell glob
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700260pattern. Filepairs that match a glob pattern on an earlier line
261in the file are output before ones that match a later line, and
262filepairs that do not match any glob pattern are output last.
263
c.shoemaker@cox.net59df2a12005-10-29 00:15:49 -0400264As an example, a typical orderfile for the core git probably
Junio C Hamano8db93072005-08-30 13:51:01 -0700265would look like this:
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700266
Junio C Hamano8db93072005-08-30 13:51:01 -0700267------------------------------------------------
Jonas Fonsecadf8baa42005-10-03 19:16:30 +0200268README
269Makefile
270Documentation
271*.h
272*.c
273t
Junio C Hamano8db93072005-08-30 13:51:01 -0700274------------------------------------------------
Junio C Hamano4a1332d2005-06-05 14:30:58 -0700275