read-cache: add index.skipHash config option

The previous change allowed skipping the hashing portion of the
hashwrite API, using it instead as a buffered write API. Disabling the
hashwrite can be particularly helpful when the write operation is in a
critical path.

One such critical path is the writing of the index. This operation is so
critical that the sparse index was created specifically to reduce the
size of the index to make these writes (and reads) faster.

This trade-off between file stability at rest and write-time performance
is not easy to balance. The index is an interesting case for a couple
reasons:

1. Writes block users. Writing the index takes place in many user-
   blocking foreground operations. The speed improvement directly
   impacts their use. Other file formats are typically written in the
   background (commit-graph, multi-pack-index) or are super-critical to
   correctness (pack-files).

2. Index files are short lived. It is rare that a user leaves an index
   for a long time with many staged changes. Outside of staged changes,
   the index can be completely destroyed and rewritten with minimal
   impact to the user.

Following a similar approach to one used in the microsoft/git fork [1],
add a new config option (index.skipHash) that allows disabling this
hashing during the index write. The cost is that we can no longer
validate the contents for corruption-at-rest using the trailing hash.

[1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201

We load this config from the repository config given by istate->repo,
with a fallback to the_repository if it is not set.

While older Git versions will not recognize the null hash as a special
case, the file format itself is still being met in terms of its
structure. Using this null hash will still allow Git operations to
function across older versions.

The one exception is 'git fsck' which checks the hash of the index file.
This used to be a check on every index read, but was split out to just
the index in a33fc72fe91 (read-cache: force_verify_index_checksum,
2017-04-14) and released first in Git 2.13.0. Document the versions that
relaxed these restrictions, with the optimistic expectation that this
change will be included in Git 2.40.0.

Here, we disable this check if the trailing hash is all zeroes. We add a
warning to the config option that this may cause undesirable behavior
with older Git versions.

As a quick comparison, I tested 'git update-index --force-write' with
and without index.skipHash=true on a copy of the Linux kernel
repository.

Benchmark 1: with hash
  Time (mean ± σ):      46.3 ms ±  13.8 ms    [User: 34.3 ms, System: 11.9 ms]
  Range (min … max):    34.3 ms …  79.1 ms    82 runs

Benchmark 2: without hash
  Time (mean ± σ):      26.0 ms ±   7.9 ms    [User: 11.8 ms, System: 14.2 ms]
  Range (min … max):    16.3 ms …  42.0 ms    69 runs

Summary
  'without hash' ran
    1.78 ± 0.76 times faster than 'with hash'

These performance benefits are substantial enough to allow users the
ability to opt-in to this feature, even with the potential confusion
with older 'git fsck' versions.

Test this new config option, both at a command-line level and within a
submodule. The confirmation is currently limited to confirm that 'git
fsck' does not complain about the index. Future updates will make this
test more robust.

It is critical that this test is placed before the test_index_version
tests, since those tests obliterate the .git/config file and hence lose
the setting from GIT_TEST_DEFAULT_HASH, if set.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 files changed
tree: 1df0258257adec36cbab327e79d8cc6e457007f9
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. oss-fuzz/
  15. perl/
  16. po/
  17. refs/
  18. reftable/
  19. sha1dc/
  20. sha256/
  21. t/
  22. templates/
  23. trace2/
  24. xdiff/
  25. .cirrus.yml
  26. .clang-format
  27. .editorconfig
  28. .gitattributes
  29. .gitignore
  30. .gitmodules
  31. .mailmap
  32. .tsan-suppressions
  33. abspath.c
  34. aclocal.m4
  35. add-interactive.c
  36. add-interactive.h
  37. add-patch.c
  38. advice.c
  39. advice.h
  40. alias.c
  41. alias.h
  42. alloc.c
  43. alloc.h
  44. apply.c
  45. apply.h
  46. archive-tar.c
  47. archive-zip.c
  48. archive.c
  49. archive.h
  50. attr.c
  51. attr.h
  52. banned.h
  53. base85.c
  54. bisect.c
  55. bisect.h
  56. blame.c
  57. blame.h
  58. blob.c
  59. blob.h
  60. bloom.c
  61. bloom.h
  62. branch.c
  63. branch.h
  64. builtin.h
  65. bulk-checkin.c
  66. bulk-checkin.h
  67. bundle-uri.c
  68. bundle-uri.h
  69. bundle.c
  70. bundle.h
  71. cache-tree.c
  72. cache-tree.h
  73. cache.h
  74. cbtree.c
  75. cbtree.h
  76. chdir-notify.c
  77. chdir-notify.h
  78. check-builtins.sh
  79. checkout.c
  80. checkout.h
  81. chunk-format.c
  82. chunk-format.h
  83. CODE_OF_CONDUCT.md
  84. color.c
  85. color.h
  86. column.c
  87. column.h
  88. combine-diff.c
  89. command-list.txt
  90. commit-graph.c
  91. commit-graph.h
  92. commit-reach.c
  93. commit-reach.h
  94. commit-slab-decl.h
  95. commit-slab-impl.h
  96. commit-slab.h
  97. commit.c
  98. commit.h
  99. common-main.c
  100. config.c
  101. config.h
  102. config.mak.dev
  103. config.mak.in
  104. config.mak.uname
  105. configure.ac
  106. connect.c
  107. connect.h
  108. connected.c
  109. connected.h
  110. convert.c
  111. convert.h
  112. copy.c
  113. COPYING
  114. credential.c
  115. credential.h
  116. csum-file.c
  117. csum-file.h
  118. ctype.c
  119. daemon.c
  120. date.c
  121. date.h
  122. decorate.c
  123. decorate.h
  124. delta-islands.c
  125. delta-islands.h
  126. delta.h
  127. detect-compiler
  128. diagnose.c
  129. diagnose.h
  130. diff-delta.c
  131. diff-lib.c
  132. diff-merges.c
  133. diff-merges.h
  134. diff-no-index.c
  135. diff.c
  136. diff.h
  137. diffcore-break.c
  138. diffcore-delta.c
  139. diffcore-order.c
  140. diffcore-pickaxe.c
  141. diffcore-rename.c
  142. diffcore-rotate.c
  143. diffcore.h
  144. dir-iterator.c
  145. dir-iterator.h
  146. dir.c
  147. dir.h
  148. editor.c
  149. entry.c
  150. entry.h
  151. environment.c
  152. environment.h
  153. exec-cmd.c
  154. exec-cmd.h
  155. fetch-negotiator.c
  156. fetch-negotiator.h
  157. fetch-pack.c
  158. fetch-pack.h
  159. fmt-merge-msg.c
  160. fmt-merge-msg.h
  161. fsck.c
  162. fsck.h
  163. fsmonitor--daemon.h
  164. fsmonitor-ipc.c
  165. fsmonitor-ipc.h
  166. fsmonitor-path-utils.h
  167. fsmonitor-settings.c
  168. fsmonitor-settings.h
  169. fsmonitor.c
  170. fsmonitor.h
  171. generate-cmdlist.sh
  172. generate-configlist.sh
  173. generate-hooklist.sh
  174. gettext.c
  175. gettext.h
  176. git-add--interactive.perl
  177. git-archimport.perl
  178. git-bisect.sh
  179. git-compat-util.h
  180. git-curl-compat.h
  181. git-cvsexportcommit.perl
  182. git-cvsimport.perl
  183. git-cvsserver.perl
  184. git-difftool--helper.sh
  185. git-filter-branch.sh
  186. git-instaweb.sh
  187. git-merge-octopus.sh
  188. git-merge-one-file.sh
  189. git-merge-resolve.sh
  190. git-mergetool--lib.sh
  191. git-mergetool.sh
  192. git-p4.py
  193. git-quiltimport.sh
  194. git-request-pull.sh
  195. git-send-email.perl
  196. git-sh-i18n.sh
  197. git-sh-setup.sh
  198. git-submodule.sh
  199. git-svn.perl
  200. GIT-VERSION-GEN
  201. git-web--browse.sh
  202. git.c
  203. git.rc
  204. gpg-interface.c
  205. gpg-interface.h
  206. graph.c
  207. graph.h
  208. grep.c
  209. grep.h
  210. hash-lookup.c
  211. hash-lookup.h
  212. hash.h
  213. hashmap.c
  214. hashmap.h
  215. help.c
  216. help.h
  217. hex.c
  218. hook.c
  219. hook.h
  220. http-backend.c
  221. http-fetch.c
  222. http-push.c
  223. http-walker.c
  224. http.c
  225. http.h
  226. ident.c
  227. imap-send.c
  228. INSTALL
  229. iterator.h
  230. json-writer.c
  231. json-writer.h
  232. khash.h
  233. kwset.c
  234. kwset.h
  235. levenshtein.c
  236. levenshtein.h
  237. LGPL-2.1
  238. line-log.c
  239. line-log.h
  240. line-range.c
  241. line-range.h
  242. linear-assignment.c
  243. linear-assignment.h
  244. list-objects-filter-options.c
  245. list-objects-filter-options.h
  246. list-objects-filter.c
  247. list-objects-filter.h
  248. list-objects.c
  249. list-objects.h
  250. list.h
  251. ll-merge.c
  252. ll-merge.h
  253. lockfile.c
  254. lockfile.h
  255. log-tree.c
  256. log-tree.h
  257. ls-refs.c
  258. ls-refs.h
  259. mailinfo.c
  260. mailinfo.h
  261. mailmap.c
  262. mailmap.h
  263. Makefile
  264. match-trees.c
  265. mem-pool.c
  266. mem-pool.h
  267. merge-blobs.c
  268. merge-blobs.h
  269. merge-ort-wrappers.c
  270. merge-ort-wrappers.h
  271. merge-ort.c
  272. merge-ort.h
  273. merge-recursive.c
  274. merge-recursive.h
  275. merge.c
  276. mergesort.h
  277. midx.c
  278. midx.h
  279. name-hash.c
  280. notes-cache.c
  281. notes-cache.h
  282. notes-merge.c
  283. notes-merge.h
  284. notes-utils.c
  285. notes-utils.h
  286. notes.c
  287. notes.h
  288. object-file.c
  289. object-name.c
  290. object-store.h
  291. object.c
  292. object.h
  293. oid-array.c
  294. oid-array.h
  295. oidmap.c
  296. oidmap.h
  297. oidset.c
  298. oidset.h
  299. oidtree.c
  300. oidtree.h
  301. pack-bitmap-write.c
  302. pack-bitmap.c
  303. pack-bitmap.h
  304. pack-check.c
  305. pack-mtimes.c
  306. pack-mtimes.h
  307. pack-objects.c
  308. pack-objects.h
  309. pack-revindex.c
  310. pack-revindex.h
  311. pack-write.c
  312. pack.h
  313. packfile.c
  314. packfile.h
  315. pager.c
  316. parallel-checkout.c
  317. parallel-checkout.h
  318. parse-options-cb.c
  319. parse-options.c
  320. parse-options.h
  321. patch-delta.c
  322. patch-ids.c
  323. patch-ids.h
  324. path.c
  325. path.h
  326. pathspec.c
  327. pathspec.h
  328. pkt-line.c
  329. pkt-line.h
  330. preload-index.c
  331. pretty.c
  332. pretty.h
  333. prio-queue.c
  334. prio-queue.h
  335. progress.c
  336. progress.h
  337. promisor-remote.c
  338. promisor-remote.h
  339. prompt.c
  340. prompt.h
  341. protocol-caps.c
  342. protocol-caps.h
  343. protocol.c
  344. protocol.h
  345. prune-packed.c
  346. prune-packed.h
  347. quote.c
  348. quote.h
  349. range-diff.c
  350. range-diff.h
  351. reachable.c
  352. reachable.h
  353. read-cache.c
  354. README.md
  355. rebase-interactive.c
  356. rebase-interactive.h
  357. rebase.c
  358. rebase.h
  359. ref-filter.c
  360. ref-filter.h
  361. reflog-walk.c
  362. reflog-walk.h
  363. reflog.c
  364. reflog.h
  365. refs.c
  366. refs.h
  367. refspec.c
  368. refspec.h
  369. remote-curl.c
  370. remote.c
  371. remote.h
  372. replace-object.c
  373. replace-object.h
  374. repo-settings.c
  375. repository.c
  376. repository.h
  377. rerere.c
  378. rerere.h
  379. reset.c
  380. reset.h
  381. resolve-undo.c
  382. resolve-undo.h
  383. revision.c
  384. revision.h
  385. run-command.c
  386. run-command.h
  387. scalar.c
  388. SECURITY.md
  389. send-pack.c
  390. send-pack.h
  391. sequencer.c
  392. sequencer.h
  393. serve.c
  394. serve.h
  395. server-info.c
  396. setup.c
  397. sh-i18n--envsubst.c
  398. sha1dc_git.c
  399. sha1dc_git.h
  400. shallow.c
  401. shallow.h
  402. shared.mak
  403. shell.c
  404. shortlog.h
  405. sideband.c
  406. sideband.h
  407. sigchain.c
  408. sigchain.h
  409. simple-ipc.h
  410. sparse-index.c
  411. sparse-index.h
  412. split-index.c
  413. split-index.h
  414. stable-qsort.c
  415. strbuf.c
  416. strbuf.h
  417. streaming.c
  418. streaming.h
  419. string-list.c
  420. string-list.h
  421. strmap.c
  422. strmap.h
  423. strvec.c
  424. strvec.h
  425. sub-process.c
  426. sub-process.h
  427. submodule-config.c
  428. submodule-config.h
  429. submodule.c
  430. submodule.h
  431. symlinks.c
  432. tag.c
  433. tag.h
  434. tar.h
  435. tempfile.c
  436. tempfile.h
  437. thread-utils.c
  438. thread-utils.h
  439. tmp-objdir.c
  440. tmp-objdir.h
  441. trace.c
  442. trace.h
  443. trace2.c
  444. trace2.h
  445. trailer.c
  446. trailer.h
  447. transport-helper.c
  448. transport-internal.h
  449. transport.c
  450. transport.h
  451. tree-diff.c
  452. tree-walk.c
  453. tree-walk.h
  454. tree.c
  455. tree.h
  456. unicode-width.h
  457. unimplemented.sh
  458. unix-socket.c
  459. unix-socket.h
  460. unix-stream-server.c
  461. unix-stream-server.h
  462. unpack-trees.c
  463. unpack-trees.h
  464. upload-pack.c
  465. upload-pack.h
  466. url.c
  467. url.h
  468. urlmatch.c
  469. urlmatch.h
  470. usage.c
  471. userdiff.c
  472. userdiff.h
  473. utf8.c
  474. utf8.h
  475. varint.c
  476. varint.h
  477. version.c
  478. version.h
  479. versioncmp.c
  480. walker.c
  481. walker.h
  482. wildmatch.c
  483. wildmatch.h
  484. worktree.c
  485. worktree.h
  486. wrap-for-bin.sh
  487. wrapper.c
  488. write-or-die.c
  489. ws.c
  490. wt-status.c
  491. wt-status.h
  492. xdiff-interface.c
  493. xdiff-interface.h
  494. zlib.c
README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org (not the Git list). The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks