pack-objects: enforce --depth limit in reused deltas

Since 898b14c (pack-objects: rework check_delta_limit usage,
2007-04-16), we check the delta depth limit only when
figuring out whether we should make a new delta. We don't
consider it at all when reusing deltas, which means that
packing once with --depth=250, and then again with
--depth=50, the second pack may still contain chains larger
than 50.

This is generally considered a feature, as the results of
earlier high-depth repacks are carried forward, used for
serving fetches, etc. However, since we started using
cross-pack deltas in c9af708b1 (pack-objects: use mru list
when iterating over packs, 2016-08-11), we are no longer
bounded by the length of an existing delta chain in a single
pack.

Here's one particular pathological case: a sequence of N
packs, each with 2 objects, the base of which is stored as a
delta in a previous pack. If we chain all the deltas
together, we have a cycle of length N. We break the cycle,
but the tip delta is still at depth N-1.

This is less unlikely than it might sound. See the included
test for a reconstruction based on real-world actions.  I
ran into such a case in the wild, where a client was rapidly
sending packs, and we had accumulated 10,000 before doing a
server-side repack.  The pack that "git repack" tried to
generate had a very deep chain, which caused pack-objects to
run out of stack space in the recursive write_one().

This patch bounds the length of delta chains in the output
pack based on --depth, regardless of whether they are caused
by cross-pack deltas or existed in the input packs. This
fixes the problem, but does have two possible downsides:

  1. High-depth aggressive repacks followed by "normal"
     repacks will throw away the high-depth chains.

     In the long run this is probably OK; investigation
     showed that high-depth repacks aren't actually
     beneficial, and we dropped the aggressive depth default
     to match the normal case in 07e7dbf0d (gc: default
     aggressive depth to 50, 2016-08-11).

  2. If you really do want to store high-depth deltas on
     disk, they may be discarded and new delta computed when
     serving a fetch, unless you set pack.depth to match
     your high-depth size.

The implementation uses the existing search for delta
cycles.  That lets us compute the depth of any node based on
the depth of its base, because we know the base is DFS_DONE
by the time we look at it (modulo any cycles in the graph,
but we know there cannot be any because we break them as we
see them).

There is some subtlety worth mentioning, though. We record
the depth of each object as we compute it. It might seem
like we could save the per-object storage space by just
keeping track of the depth of our traversal (i.e., have
break_delta_chains() report how deep it went). But we may
visit an object through multiple delta paths, and on
subsequent paths we want to know its depth immediately,
without having to walk back down to its final base (doing so
would make our graph walk quadratic rather than linear).

Likewise, one could try to record the depth not from the
base, but from our starting point (i.e., start
recursion_depth at 0, and pass "recursion_depth + 1" to each
invocation of break_delta_chains()). And then when
recursion_depth gets too big, we know that we must cut the
delta chain.  But that technique is wrong if we do not visit
the nodes in topological order. In a chain A->B->C, it
if we visit "C", then "B", then "A", we will never recurse
deeper than 1 link (because we see at each node that we have
already visited it).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 files changed
tree: 8c6d925f66d7a348935cc976c241eea0ae2c7e4d
  1. block-sha1/
  2. builtin/
  3. ci/
  4. compat/
  5. contrib/
  6. Documentation/
  7. ewah/
  8. git-gui/
  9. gitk-git/
  10. gitweb/
  11. mergetools/
  12. perl/
  13. po/
  14. ppc/
  15. refs/
  16. t/
  17. templates/
  18. vcs-svn/
  19. xdiff/
  20. .gitattributes
  21. .gitignore
  22. .mailmap
  23. .travis.yml
  24. abspath.c
  25. aclocal.m4
  26. advice.c
  27. advice.h
  28. alias.c
  29. alloc.c
  30. apply.c
  31. apply.h
  32. archive-tar.c
  33. archive-zip.c
  34. archive.c
  35. archive.h
  36. argv-array.c
  37. argv-array.h
  38. attr.c
  39. attr.h
  40. base85.c
  41. bisect.c
  42. bisect.h
  43. blob.c
  44. blob.h
  45. branch.c
  46. branch.h
  47. builtin.h
  48. bulk-checkin.c
  49. bulk-checkin.h
  50. bundle.c
  51. bundle.h
  52. cache-tree.c
  53. cache-tree.h
  54. cache.h
  55. check-builtins.sh
  56. check-racy.c
  57. check_bindir
  58. color.c
  59. color.h
  60. column.c
  61. column.h
  62. combine-diff.c
  63. command-list.txt
  64. commit-slab.h
  65. commit.c
  66. commit.h
  67. common-main.c
  68. config.c
  69. config.mak.in
  70. config.mak.uname
  71. configure.ac
  72. connect.c
  73. connect.h
  74. connected.c
  75. connected.h
  76. convert.c
  77. convert.h
  78. copy.c
  79. COPYING
  80. credential-cache--daemon.c
  81. credential-cache.c
  82. credential-store.c
  83. credential.c
  84. credential.h
  85. csum-file.c
  86. csum-file.h
  87. ctype.c
  88. daemon.c
  89. date.c
  90. decorate.c
  91. decorate.h
  92. delta.h
  93. diff-delta.c
  94. diff-lib.c
  95. diff-no-index.c
  96. diff.c
  97. diff.h
  98. diffcore-break.c
  99. diffcore-delta.c
  100. diffcore-order.c
  101. diffcore-pickaxe.c
  102. diffcore-rename.c
  103. diffcore.h
  104. dir-iterator.c
  105. dir-iterator.h
  106. dir.c
  107. dir.h
  108. editor.c
  109. entry.c
  110. environment.c
  111. exec_cmd.c
  112. exec_cmd.h
  113. fast-import.c
  114. fetch-pack.c
  115. fetch-pack.h
  116. fmt-merge-msg.h
  117. fsck.c
  118. fsck.h
  119. generate-cmdlist.sh
  120. gettext.c
  121. gettext.h
  122. git-add--interactive.perl
  123. git-archimport.perl
  124. git-bisect.sh
  125. git-compat-util.h
  126. git-cvsexportcommit.perl
  127. git-cvsimport.perl
  128. git-cvsserver.perl
  129. git-difftool--helper.sh
  130. git-difftool.perl
  131. git-filter-branch.sh
  132. git-instaweb.sh
  133. git-merge-octopus.sh
  134. git-merge-one-file.sh
  135. git-merge-resolve.sh
  136. git-mergetool--lib.sh
  137. git-mergetool.sh
  138. git-p4.py
  139. git-parse-remote.sh
  140. git-quiltimport.sh
  141. git-rebase--am.sh
  142. git-rebase--interactive.sh
  143. git-rebase--merge.sh
  144. git-rebase.sh
  145. git-relink.perl
  146. git-remote-testgit.sh
  147. git-request-pull.sh
  148. git-send-email.perl
  149. git-sh-i18n.sh
  150. git-sh-setup.sh
  151. git-stash.sh
  152. git-submodule.sh
  153. git-svn.perl
  154. GIT-VERSION-GEN
  155. git-web--browse.sh
  156. git.c
  157. git.rc
  158. gpg-interface.c
  159. gpg-interface.h
  160. graph.c
  161. graph.h
  162. grep.c
  163. grep.h
  164. hashmap.c
  165. hashmap.h
  166. help.c
  167. help.h
  168. hex.c
  169. http-backend.c
  170. http-fetch.c
  171. http-push.c
  172. http-walker.c
  173. http.c
  174. http.h
  175. ident.c
  176. imap-send.c
  177. INSTALL
  178. iterator.h
  179. khash.h
  180. kwset.c
  181. kwset.h
  182. levenshtein.c
  183. levenshtein.h
  184. LGPL-2.1
  185. line-log.c
  186. line-log.h
  187. line-range.c
  188. line-range.h
  189. list-objects.c
  190. list-objects.h
  191. list.h
  192. ll-merge.c
  193. ll-merge.h
  194. lockfile.c
  195. lockfile.h
  196. log-tree.c
  197. log-tree.h
  198. mailinfo.c
  199. mailinfo.h
  200. mailmap.c
  201. mailmap.h
  202. Makefile
  203. match-trees.c
  204. merge-blobs.c
  205. merge-blobs.h
  206. merge-recursive.c
  207. merge-recursive.h
  208. merge.c
  209. mergesort.c
  210. mergesort.h
  211. mru.c
  212. mru.h
  213. name-hash.c
  214. notes-cache.c
  215. notes-cache.h
  216. notes-merge.c
  217. notes-merge.h
  218. notes-utils.c
  219. notes-utils.h
  220. notes.c
  221. notes.h
  222. object.c
  223. object.h
  224. pack-bitmap-write.c
  225. pack-bitmap.c
  226. pack-bitmap.h
  227. pack-check.c
  228. pack-objects.c
  229. pack-objects.h
  230. pack-revindex.c
  231. pack-revindex.h
  232. pack-write.c
  233. pack.h
  234. pager.c
  235. parse-options-cb.c
  236. parse-options.c
  237. parse-options.h
  238. patch-delta.c
  239. patch-ids.c
  240. patch-ids.h
  241. path.c
  242. pathspec.c
  243. pathspec.h
  244. pkt-line.c
  245. pkt-line.h
  246. preload-index.c
  247. pretty.c
  248. prio-queue.c
  249. prio-queue.h
  250. progress.c
  251. progress.h
  252. prompt.c
  253. prompt.h
  254. quote.c
  255. quote.h
  256. reachable.c
  257. reachable.h
  258. read-cache.c
  259. README.md
  260. ref-filter.c
  261. ref-filter.h
  262. reflog-walk.c
  263. reflog-walk.h
  264. refs.c
  265. refs.h
  266. remote-curl.c
  267. remote-testsvn.c
  268. remote.c
  269. remote.h
  270. replace_object.c
  271. rerere.c
  272. rerere.h
  273. resolve-undo.c
  274. resolve-undo.h
  275. revision.c
  276. revision.h
  277. run-command.c
  278. run-command.h
  279. send-pack.c
  280. send-pack.h
  281. sequencer.c
  282. sequencer.h
  283. server-info.c
  284. setup.c
  285. sh-i18n--envsubst.c
  286. sha1-array.c
  287. sha1-array.h
  288. sha1-lookup.c
  289. sha1-lookup.h
  290. sha1_file.c
  291. sha1_name.c
  292. shallow.c
  293. shell.c
  294. shortlog.h
  295. show-index.c
  296. sideband.c
  297. sideband.h
  298. sigchain.c
  299. sigchain.h
  300. split-index.c
  301. split-index.h
  302. strbuf.c
  303. strbuf.h
  304. streaming.c
  305. streaming.h
  306. string-list.c
  307. string-list.h
  308. submodule-config.c
  309. submodule-config.h
  310. submodule.c
  311. submodule.h
  312. symlinks.c
  313. tag.c
  314. tag.h
  315. tar.h
  316. tempfile.c
  317. tempfile.h
  318. thread-utils.c
  319. thread-utils.h
  320. tmp-objdir.c
  321. tmp-objdir.h
  322. trace.c
  323. trace.h
  324. trailer.c
  325. trailer.h
  326. transport-helper.c
  327. transport.c
  328. transport.h
  329. tree-diff.c
  330. tree-walk.c
  331. tree-walk.h
  332. tree.c
  333. tree.h
  334. unicode_width.h
  335. unimplemented.sh
  336. unix-socket.c
  337. unix-socket.h
  338. unpack-trees.c
  339. unpack-trees.h
  340. upload-pack.c
  341. url.c
  342. url.h
  343. urlmatch.c
  344. urlmatch.h
  345. usage.c
  346. userdiff.c
  347. userdiff.h
  348. utf8.c
  349. utf8.h
  350. varint.c
  351. varint.h
  352. version.c
  353. version.h
  354. versioncmp.c
  355. walker.c
  356. walker.h
  357. wildmatch.c
  358. wildmatch.h
  359. worktree.c
  360. worktree.h
  361. wrap-for-bin.sh
  362. wrapper.c
  363. write_or_die.c
  364. ws.c
  365. wt-status.c
  366. wt-status.h
  367. xdiff-interface.c
  368. xdiff-interface.h
  369. zlib.c
README.md

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from http://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git, http://marc.info/?l=git and other archival sites.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks