commit-graph: check consistency of fanout table

We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.

One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):

  1. We can check the value of the final fanout entry against the size
     of the table we got from the index chunk. These must always match,
     since the fanout is just slicing up the index.

       As a side note, the midx and pack idx code compute it the other
       way around: they use the final fanout value as the object count, and
       check the index size against it. Either is valid; if they
       disagree we cannot know which is wrong (a corrupted fanout value,
       or a too-small table of oids).

  2. We can quickly scan the fanout table to make sure it is
     monotonically increasing. If it is, then we know that every value
     is less than or equal to the final value, and therefore less than
     or equal to the table size.

     It would also be sufficient to just check that each fanout value is
     smaller than the final one, but the midx and pack idx code both do
     a full monotonicity check. It's the same cost, and it catches some
     other corruptions (though not all; the checks done by "commit-graph
     verify" are more complete but more expensive, and our goal here is
     to be fast and memory-safe).

There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).

The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).

We need adjustments to a few existing tests, as well:

  - an earlier test in t5318 corrupts the fanout and runs "commit-graph
    verify". Its message is now changed, since we catch the problem
    earlier (during the load step, rather than the careful validation
    step).

  - in t5324, we test that "commit-graph verify --shallow" does not do
    expensive verification on the base file of the chain. But the
    corruption it uses (munging a byte at offset 1000) happens to be in
    the middle of the fanout table. And now we detect that problem in
    the cheaper checks that are performed for every part of the graph.
    We'll push this back to offset 1500, which is only caught by the
    more expensive checksum validation.

    Likewise, there's a later test in t5324 which munges an offset 100
    bytes into a file (also in the fanout table) that is referenced by
    an alternates file. So we now find that corruption during the load
    step, rather than the verification step. At the very least we need
    to change the error message (like the case above in t5318). But it
    is probably good to make sure we handle all parts of the
    verification even for alternate graph files. So let's likewise
    corrupt byte 1500 and make sure we found the invalid checksum.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 files changed
tree: 115dfdae9ff93b3510c5e13c680528ffabb7b5b5
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. oss-fuzz/
  15. perl/
  16. po/
  17. refs/
  18. reftable/
  19. sha1/
  20. sha1dc/
  21. sha256/
  22. t/
  23. templates/
  24. trace2/
  25. xdiff/
  26. .cirrus.yml
  27. .clang-format
  28. .editorconfig
  29. .gitattributes
  30. .gitignore
  31. .gitmodules
  32. .mailmap
  33. .tsan-suppressions
  34. abspath.c
  35. abspath.h
  36. aclocal.m4
  37. add-interactive.c
  38. add-interactive.h
  39. add-patch.c
  40. advice.c
  41. advice.h
  42. alias.c
  43. alias.h
  44. alloc.c
  45. alloc.h
  46. apply.c
  47. apply.h
  48. archive-tar.c
  49. archive-zip.c
  50. archive.c
  51. archive.h
  52. attr.c
  53. attr.h
  54. banned.h
  55. base85.c
  56. base85.h
  57. bisect.c
  58. bisect.h
  59. blame.c
  60. blame.h
  61. blob.c
  62. blob.h
  63. bloom.c
  64. bloom.h
  65. branch.c
  66. branch.h
  67. builtin.h
  68. bulk-checkin.c
  69. bulk-checkin.h
  70. bundle-uri.c
  71. bundle-uri.h
  72. bundle.c
  73. bundle.h
  74. cache-tree.c
  75. cache-tree.h
  76. cbtree.c
  77. cbtree.h
  78. chdir-notify.c
  79. chdir-notify.h
  80. check-builtins.sh
  81. checkout.c
  82. checkout.h
  83. chunk-format.c
  84. chunk-format.h
  85. CODE_OF_CONDUCT.md
  86. color.c
  87. color.h
  88. column.c
  89. column.h
  90. combine-diff.c
  91. command-list.txt
  92. commit-graph.c
  93. commit-graph.h
  94. commit-reach.c
  95. commit-reach.h
  96. commit-slab-decl.h
  97. commit-slab-impl.h
  98. commit-slab.h
  99. commit.c
  100. commit.h
  101. common-main.c
  102. config.c
  103. config.h
  104. config.mak.dev
  105. config.mak.in
  106. config.mak.uname
  107. configure.ac
  108. connect.c
  109. connect.h
  110. connected.c
  111. connected.h
  112. convert.c
  113. convert.h
  114. copy.c
  115. copy.h
  116. COPYING
  117. credential.c
  118. credential.h
  119. csum-file.c
  120. csum-file.h
  121. ctype.c
  122. daemon.c
  123. date.c
  124. date.h
  125. decorate.c
  126. decorate.h
  127. delta-islands.c
  128. delta-islands.h
  129. delta.h
  130. detect-compiler
  131. diagnose.c
  132. diagnose.h
  133. diff-delta.c
  134. diff-lib.c
  135. diff-merges.c
  136. diff-merges.h
  137. diff-no-index.c
  138. diff.c
  139. diff.h
  140. diffcore-break.c
  141. diffcore-delta.c
  142. diffcore-order.c
  143. diffcore-pickaxe.c
  144. diffcore-rename.c
  145. diffcore-rotate.c
  146. diffcore.h
  147. dir-iterator.c
  148. dir-iterator.h
  149. dir.c
  150. dir.h
  151. editor.c
  152. editor.h
  153. entry.c
  154. entry.h
  155. environment.c
  156. environment.h
  157. exec-cmd.c
  158. exec-cmd.h
  159. fetch-negotiator.c
  160. fetch-negotiator.h
  161. fetch-pack.c
  162. fetch-pack.h
  163. fmt-merge-msg.c
  164. fmt-merge-msg.h
  165. fsck.c
  166. fsck.h
  167. fsmonitor--daemon.h
  168. fsmonitor-ipc.c
  169. fsmonitor-ipc.h
  170. fsmonitor-ll.h
  171. fsmonitor-path-utils.h
  172. fsmonitor-settings.c
  173. fsmonitor-settings.h
  174. fsmonitor.c
  175. fsmonitor.h
  176. generate-cmdlist.sh
  177. generate-configlist.sh
  178. generate-hooklist.sh
  179. gettext.c
  180. gettext.h
  181. git-archimport.perl
  182. git-compat-util.h
  183. git-curl-compat.h
  184. git-cvsexportcommit.perl
  185. git-cvsimport.perl
  186. git-cvsserver.perl
  187. git-difftool--helper.sh
  188. git-filter-branch.sh
  189. git-instaweb.sh
  190. git-merge-octopus.sh
  191. git-merge-one-file.sh
  192. git-merge-resolve.sh
  193. git-mergetool--lib.sh
  194. git-mergetool.sh
  195. git-p4.py
  196. git-quiltimport.sh
  197. git-request-pull.sh
  198. git-send-email.perl
  199. git-sh-i18n.sh
  200. git-sh-setup.sh
  201. git-submodule.sh
  202. git-svn.perl
  203. GIT-VERSION-GEN
  204. git-web--browse.sh
  205. git-zlib.c
  206. git-zlib.h
  207. git.c
  208. git.rc
  209. gpg-interface.c
  210. gpg-interface.h
  211. graph.c
  212. graph.h
  213. grep.c
  214. grep.h
  215. hash-ll.h
  216. hash-lookup.c
  217. hash-lookup.h
  218. hash.h
  219. hashmap.c
  220. hashmap.h
  221. help.c
  222. help.h
  223. hex.c
  224. hex.h
  225. hook.c
  226. hook.h
  227. http-backend.c
  228. http-fetch.c
  229. http-push.c
  230. http-walker.c
  231. http.c
  232. http.h
  233. ident.c
  234. ident.h
  235. imap-send.c
  236. INSTALL
  237. iterator.h
  238. json-writer.c
  239. json-writer.h
  240. khash.h
  241. kwset.c
  242. kwset.h
  243. levenshtein.c
  244. levenshtein.h
  245. LGPL-2.1
  246. line-log.c
  247. line-log.h
  248. line-range.c
  249. line-range.h
  250. linear-assignment.c
  251. linear-assignment.h
  252. list-objects-filter-options.c
  253. list-objects-filter-options.h
  254. list-objects-filter.c
  255. list-objects-filter.h
  256. list-objects.c
  257. list-objects.h
  258. list.h
  259. lockfile.c
  260. lockfile.h
  261. log-tree.c
  262. log-tree.h
  263. ls-refs.c
  264. ls-refs.h
  265. mailinfo.c
  266. mailinfo.h
  267. mailmap.c
  268. mailmap.h
  269. Makefile
  270. match-trees.c
  271. match-trees.h
  272. mem-pool.c
  273. mem-pool.h
  274. merge-blobs.c
  275. merge-blobs.h
  276. merge-ll.c
  277. merge-ll.h
  278. merge-ort-wrappers.c
  279. merge-ort-wrappers.h
  280. merge-ort.c
  281. merge-ort.h
  282. merge-recursive.c
  283. merge-recursive.h
  284. merge.c
  285. merge.h
  286. mergesort.h
  287. midx.c
  288. midx.h
  289. name-hash.c
  290. name-hash.h
  291. notes-cache.c
  292. notes-cache.h
  293. notes-merge.c
  294. notes-merge.h
  295. notes-utils.c
  296. notes-utils.h
  297. notes.c
  298. notes.h
  299. object-file.c
  300. object-file.h
  301. object-name.c
  302. object-name.h
  303. object-store-ll.h
  304. object-store.h
  305. object.c
  306. object.h
  307. oid-array.c
  308. oid-array.h
  309. oidmap.c
  310. oidmap.h
  311. oidset.c
  312. oidset.h
  313. oidtree.c
  314. oidtree.h
  315. pack-bitmap-write.c
  316. pack-bitmap.c
  317. pack-bitmap.h
  318. pack-check.c
  319. pack-mtimes.c
  320. pack-mtimes.h
  321. pack-objects.c
  322. pack-objects.h
  323. pack-revindex.c
  324. pack-revindex.h
  325. pack-write.c
  326. pack.h
  327. packfile.c
  328. packfile.h
  329. pager.c
  330. pager.h
  331. parallel-checkout.c
  332. parallel-checkout.h
  333. parse-options-cb.c
  334. parse-options.c
  335. parse-options.h
  336. patch-delta.c
  337. patch-ids.c
  338. patch-ids.h
  339. path.c
  340. path.h
  341. pathspec.c
  342. pathspec.h
  343. pkt-line.c
  344. pkt-line.h
  345. preload-index.c
  346. preload-index.h
  347. pretty.c
  348. pretty.h
  349. prio-queue.c
  350. prio-queue.h
  351. progress.c
  352. progress.h
  353. promisor-remote.c
  354. promisor-remote.h
  355. prompt.c
  356. prompt.h
  357. protocol-caps.c
  358. protocol-caps.h
  359. protocol.c
  360. protocol.h
  361. prune-packed.c
  362. prune-packed.h
  363. quote.c
  364. quote.h
  365. range-diff.c
  366. range-diff.h
  367. reachable.c
  368. reachable.h
  369. read-cache-ll.h
  370. read-cache.c
  371. read-cache.h
  372. README.md
  373. rebase-interactive.c
  374. rebase-interactive.h
  375. rebase.c
  376. rebase.h
  377. ref-filter.c
  378. ref-filter.h
  379. reflog-walk.c
  380. reflog-walk.h
  381. reflog.c
  382. reflog.h
  383. refs.c
  384. refs.h
  385. refspec.c
  386. refspec.h
  387. remote-curl.c
  388. remote.c
  389. remote.h
  390. replace-object.c
  391. replace-object.h
  392. repo-settings.c
  393. repository.c
  394. repository.h
  395. rerere.c
  396. rerere.h
  397. reset.c
  398. reset.h
  399. resolve-undo.c
  400. resolve-undo.h
  401. revision.c
  402. revision.h
  403. run-command.c
  404. run-command.h
  405. sane-ctype.h
  406. scalar.c
  407. SECURITY.md
  408. send-pack.c
  409. send-pack.h
  410. sequencer.c
  411. sequencer.h
  412. serve.c
  413. serve.h
  414. server-info.c
  415. server-info.h
  416. setup.c
  417. setup.h
  418. sh-i18n--envsubst.c
  419. sha1dc_git.c
  420. sha1dc_git.h
  421. shallow.c
  422. shallow.h
  423. shared.mak
  424. shell.c
  425. shortlog.h
  426. sideband.c
  427. sideband.h
  428. sigchain.c
  429. sigchain.h
  430. simple-ipc.h
  431. sparse-index.c
  432. sparse-index.h
  433. split-index.c
  434. split-index.h
  435. stable-qsort.c
  436. statinfo.c
  437. statinfo.h
  438. strbuf.c
  439. strbuf.h
  440. streaming.c
  441. streaming.h
  442. string-list.c
  443. string-list.h
  444. strmap.c
  445. strmap.h
  446. strvec.c
  447. strvec.h
  448. sub-process.c
  449. sub-process.h
  450. submodule-config.c
  451. submodule-config.h
  452. submodule.c
  453. submodule.h
  454. symlinks.c
  455. symlinks.h
  456. tag.c
  457. tag.h
  458. tar.h
  459. tempfile.c
  460. tempfile.h
  461. thread-utils.c
  462. thread-utils.h
  463. tmp-objdir.c
  464. tmp-objdir.h
  465. trace.c
  466. trace.h
  467. trace2.c
  468. trace2.h
  469. trailer.c
  470. trailer.h
  471. transport-helper.c
  472. transport-internal.h
  473. transport.c
  474. transport.h
  475. tree-diff.c
  476. tree-walk.c
  477. tree-walk.h
  478. tree.c
  479. tree.h
  480. unicode-width.h
  481. unimplemented.sh
  482. unix-socket.c
  483. unix-socket.h
  484. unix-stream-server.c
  485. unix-stream-server.h
  486. unpack-trees.c
  487. unpack-trees.h
  488. upload-pack.c
  489. upload-pack.h
  490. url.c
  491. url.h
  492. urlmatch.c
  493. urlmatch.h
  494. usage.c
  495. userdiff.c
  496. userdiff.h
  497. utf8.c
  498. utf8.h
  499. varint.c
  500. varint.h
  501. version.c
  502. version.h
  503. versioncmp.c
  504. versioncmp.h
  505. walker.c
  506. walker.h
  507. wildmatch.c
  508. wildmatch.h
  509. worktree.c
  510. worktree.h
  511. wrap-for-bin.sh
  512. wrapper.c
  513. wrapper.h
  514. write-or-die.c
  515. write-or-die.h
  516. ws.c
  517. ws.h
  518. wt-status.c
  519. wt-status.h
  520. xdiff-interface.c
  521. xdiff-interface.h
README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org (not the Git list). The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks