pack-objects: improve partial packfile reuse

The old code to reuse deltas from an existing packfile
just tried to dump a whole segment of the pack verbatim.
That's faster than the traditional way of actually adding
objects to the packing list, but it didn't kick in very
often. This new code is really going for a middle ground:
do _some_ per-object work, but way less than we'd
traditionally do.

The general strategy of the new code is to make a bitmap
of objects from the packfile we'll include, and then
iterate over it, writing out each object exactly as it is
in our on-disk pack, but _not_ adding it to our packlist
(which costs memory, and increases the search space for
deltas).

One complication is that if we're omitting some objects,
we can't set a delta against a base that we're not
sending. So we have to check each object in
try_partial_reuse() to make sure we have its delta.

About performance, in the worst case we might have
interleaved objects that we are sending or not sending,
and we'd have as many chunks as objects. But in practice
we send big chunks.

For instance, packing torvalds/linux on GitHub servers
now reused 6.5M objects, but only needed ~50k chunks.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 files changed
tree: dc222189645405ee1eab03d9493a553bf6f33b34
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. perl/
  15. po/
  16. ppc/
  17. refs/
  18. sha1dc/
  19. sha256/
  20. t/
  21. templates/
  22. trace2/
  23. vcs-svn/
  24. xdiff/
  25. .clang-format
  26. .editorconfig
  27. .gitattributes
  28. .gitignore
  29. .gitmodules
  30. .mailmap
  31. .travis.yml
  32. .tsan-suppressions
  33. abspath.c
  34. aclocal.m4
  35. advice.c
  36. advice.h
  37. alias.c
  38. alias.h
  39. alloc.c
  40. alloc.h
  41. apply.c
  42. apply.h
  43. archive-tar.c
  44. archive-zip.c
  45. archive.c
  46. archive.h
  47. argv-array.c
  48. argv-array.h
  49. attr.c
  50. attr.h
  51. azure-pipelines.yml
  52. banned.h
  53. base85.c
  54. bisect.c
  55. bisect.h
  56. blame.c
  57. blame.h
  58. blob.c
  59. blob.h
  60. branch.c
  61. branch.h
  62. builtin.h
  63. bulk-checkin.c
  64. bulk-checkin.h
  65. bundle.c
  66. bundle.h
  67. cache-tree.c
  68. cache-tree.h
  69. cache.h
  70. chdir-notify.c
  71. chdir-notify.h
  72. check-builtins.sh
  73. check_bindir
  74. checkout.c
  75. checkout.h
  76. color.c
  77. color.h
  78. column.c
  79. column.h
  80. combine-diff.c
  81. command-list.txt
  82. commit-graph.c
  83. commit-graph.h
  84. commit-reach.c
  85. commit-reach.h
  86. commit-slab-decl.h
  87. commit-slab-impl.h
  88. commit-slab.h
  89. commit.c
  90. commit.h
  91. common-main.c
  92. config.c
  93. config.h
  94. config.mak.dev
  95. config.mak.in
  96. config.mak.uname
  97. configure.ac
  98. connect.c
  99. connect.h
  100. connected.c
  101. connected.h
  102. convert.c
  103. convert.h
  104. copy.c
  105. COPYING
  106. credential-cache--daemon.c
  107. credential-cache.c
  108. credential-store.c
  109. credential.c
  110. credential.h
  111. csum-file.c
  112. csum-file.h
  113. ctype.c
  114. daemon.c
  115. date.c
  116. decorate.c
  117. decorate.h
  118. delta-islands.c
  119. delta-islands.h
  120. delta.h
  121. detect-compiler
  122. diff-delta.c
  123. diff-lib.c
  124. diff-no-index.c
  125. diff.c
  126. diff.h
  127. diffcore-break.c
  128. diffcore-delta.c
  129. diffcore-order.c
  130. diffcore-pickaxe.c
  131. diffcore-rename.c
  132. diffcore.h
  133. dir-iterator.c
  134. dir-iterator.h
  135. dir.c
  136. dir.h
  137. editor.c
  138. entry.c
  139. environment.c
  140. exec-cmd.c
  141. exec-cmd.h
  142. fast-import.c
  143. fetch-negotiator.c
  144. fetch-negotiator.h
  145. fetch-object.c
  146. fetch-object.h
  147. fetch-pack.c
  148. fetch-pack.h
  149. fmt-merge-msg.h
  150. fsck.c
  151. fsck.h
  152. fsmonitor.c
  153. fsmonitor.h
  154. fuzz-commit-graph.c
  155. fuzz-pack-headers.c
  156. fuzz-pack-idx.c
  157. generate-cmdlist.sh
  158. gettext.c
  159. gettext.h
  160. git-add--interactive.perl
  161. git-archimport.perl
  162. git-bisect.sh
  163. git-compat-util.h
  164. git-cvsexportcommit.perl
  165. git-cvsimport.perl
  166. git-cvsserver.perl
  167. git-difftool--helper.sh
  168. git-filter-branch.sh
  169. git-instaweb.sh
  170. git-legacy-stash.sh
  171. git-merge-octopus.sh
  172. git-merge-one-file.sh
  173. git-merge-resolve.sh
  174. git-mergetool--lib.sh
  175. git-mergetool.sh
  176. git-p4.py
  177. git-parse-remote.sh
  178. git-quiltimport.sh
  179. git-rebase--preserve-merges.sh
  180. git-request-pull.sh
  181. git-send-email.perl
  182. git-sh-i18n.sh
  183. git-sh-setup.sh
  184. git-submodule.sh
  185. git-svn.perl
  186. GIT-VERSION-GEN
  187. git-web--browse.sh
  188. git.c
  189. git.rc
  190. gpg-interface.c
  191. gpg-interface.h
  192. graph.c
  193. graph.h
  194. grep.c
  195. grep.h
  196. hash.h
  197. hashmap.c
  198. hashmap.h
  199. help.c
  200. help.h
  201. hex.c
  202. http-backend.c
  203. http-fetch.c
  204. http-push.c
  205. http-walker.c
  206. http.c
  207. http.h
  208. ident.c
  209. imap-send.c
  210. INSTALL
  211. interdiff.c
  212. interdiff.h
  213. iterator.h
  214. json-writer.c
  215. json-writer.h
  216. khash.h
  217. kwset.c
  218. kwset.h
  219. levenshtein.c
  220. levenshtein.h
  221. LGPL-2.1
  222. line-log.c
  223. line-log.h
  224. line-range.c
  225. line-range.h
  226. linear-assignment.c
  227. linear-assignment.h
  228. list-objects-filter-options.c
  229. list-objects-filter-options.h
  230. list-objects-filter.c
  231. list-objects-filter.h
  232. list-objects.c
  233. list-objects.h
  234. list.h
  235. ll-merge.c
  236. ll-merge.h
  237. lockfile.c
  238. lockfile.h
  239. log-tree.c
  240. log-tree.h
  241. ls-refs.c
  242. ls-refs.h
  243. mailinfo.c
  244. mailinfo.h
  245. mailmap.c
  246. mailmap.h
  247. Makefile
  248. match-trees.c
  249. mem-pool.c
  250. mem-pool.h
  251. merge-blobs.c
  252. merge-blobs.h
  253. merge-recursive.c
  254. merge-recursive.h
  255. merge.c
  256. mergesort.c
  257. mergesort.h
  258. midx.c
  259. midx.h
  260. name-hash.c
  261. notes-cache.c
  262. notes-cache.h
  263. notes-merge.c
  264. notes-merge.h
  265. notes-utils.c
  266. notes-utils.h
  267. notes.c
  268. notes.h
  269. object-store.h
  270. object.c
  271. object.h
  272. oidmap.c
  273. oidmap.h
  274. oidset.c
  275. oidset.h
  276. pack-bitmap-write.c
  277. pack-bitmap.c
  278. pack-bitmap.h
  279. pack-check.c
  280. pack-objects.c
  281. pack-objects.h
  282. pack-revindex.c
  283. pack-revindex.h
  284. pack-write.c
  285. pack.h
  286. packfile.c
  287. packfile.h
  288. pager.c
  289. parse-options-cb.c
  290. parse-options.c
  291. parse-options.h
  292. patch-delta.c
  293. patch-ids.c
  294. patch-ids.h
  295. path.c
  296. path.h
  297. pathspec.c
  298. pathspec.h
  299. pkt-line.c
  300. pkt-line.h
  301. preload-index.c
  302. pretty.c
  303. pretty.h
  304. prio-queue.c
  305. prio-queue.h
  306. progress.c
  307. progress.h
  308. prompt.c
  309. prompt.h
  310. protocol.c
  311. protocol.h
  312. quote.c
  313. quote.h
  314. range-diff.c
  315. range-diff.h
  316. reachable.c
  317. reachable.h
  318. read-cache.c
  319. README.md
  320. rebase-interactive.c
  321. rebase-interactive.h
  322. ref-filter.c
  323. ref-filter.h
  324. reflog-walk.c
  325. reflog-walk.h
  326. refs.c
  327. refs.h
  328. refspec.c
  329. refspec.h
  330. remote-curl.c
  331. remote-testsvn.c
  332. remote.c
  333. remote.h
  334. replace-object.c
  335. replace-object.h
  336. repository.c
  337. repository.h
  338. rerere.c
  339. rerere.h
  340. resolve-undo.c
  341. resolve-undo.h
  342. revision.c
  343. revision.h
  344. run-command.c
  345. run-command.h
  346. send-pack.c
  347. send-pack.h
  348. sequencer.c
  349. sequencer.h
  350. serve.c
  351. serve.h
  352. server-info.c
  353. setup.c
  354. sh-i18n--envsubst.c
  355. sha1-array.c
  356. sha1-array.h
  357. sha1-file.c
  358. sha1-lookup.c
  359. sha1-lookup.h
  360. sha1-name.c
  361. sha1dc_git.c
  362. sha1dc_git.h
  363. shallow.c
  364. shell.c
  365. shortlog.h
  366. sideband.c
  367. sideband.h
  368. sigchain.c
  369. sigchain.h
  370. split-index.c
  371. split-index.h
  372. strbuf.c
  373. strbuf.h
  374. streaming.c
  375. streaming.h
  376. string-list.c
  377. string-list.h
  378. sub-process.c
  379. sub-process.h
  380. submodule-config.c
  381. submodule-config.h
  382. submodule.c
  383. submodule.h
  384. symlinks.c
  385. tag.c
  386. tag.h
  387. tar.h
  388. tempfile.c
  389. tempfile.h
  390. thread-utils.c
  391. thread-utils.h
  392. tmp-objdir.c
  393. tmp-objdir.h
  394. trace.c
  395. trace.h
  396. trace2.c
  397. trace2.h
  398. trailer.c
  399. trailer.h
  400. transport-helper.c
  401. transport-internal.h
  402. transport.c
  403. transport.h
  404. tree-diff.c
  405. tree-walk.c
  406. tree-walk.h
  407. tree.c
  408. tree.h
  409. unicode-width.h
  410. unimplemented.sh
  411. unix-socket.c
  412. unix-socket.h
  413. unpack-trees.c
  414. unpack-trees.h
  415. upload-pack.c
  416. upload-pack.h
  417. url.c
  418. url.h
  419. urlmatch.c
  420. urlmatch.h
  421. usage.c
  422. userdiff.c
  423. userdiff.h
  424. utf8.c
  425. utf8.h
  426. varint.c
  427. varint.h
  428. version.c
  429. version.h
  430. versioncmp.c
  431. walker.c
  432. walker.h
  433. wildmatch.c
  434. wildmatch.h
  435. worktree.c
  436. worktree.h
  437. wrap-for-bin.sh
  438. wrapper.c
  439. write-or-die.c
  440. ws.c
  441. wt-status.c
  442. wt-status.h
  443. xdiff-interface.c
  444. xdiff-interface.h
  445. zlib.c
README.md

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks