line-log: avoid unnecessary full tree diffs

With rename detection enabled the line-level log is able to trace the
evolution of line ranges across whole-file renames [1].  Alas, to
achieve that it uses the diff machinery very inefficiently, making the
operation very slow [2].  And since rename detection is enabled by
default, the line-level log is very slow by default.

When the line-level log processes a commit with rename detection
enabled, it currently does the following (see queue_diffs()):

  1. Computes a full tree diff between the commit and (one of) its
     parent(s), i.e. invokes diff_tree_oid() with an empty
     'diffopt->pathspec'.
  2. Checks whether any paths in the line ranges were modified.
  3. Checks whether any modified paths in the line ranges are missing
     in the parent commit's tree.
  4. If there is such a missing path, then calls diffcore_std() to
     figure out whether the path was indeed renamed based on the
     previously computed full tree diff.
  5. Continues doing stuff that are unrelated to the slowness.

So basically the line-level log computes a full tree diff for each
commit-parent pair in step (1) to be used for rename detection in step
(4) in the off chance that an interesting path is missing from the
parent.

Avoid these expensive and mostly unnecessary full tree diffs by
limiting the diffs to paths in the line ranges.  This is much cheaper,
and makes step (2) unnecessary.  If it turns out that an interesting
path is missing from the parent, then fall back and compute a full
tree diff, so the rename detection will still work.

Care must be taken when to update the pathspec used to limit the diff
in case of renames.  A path might be renamed on one branch and
modified on several parallel running branches, and while processing
commits on these branches the line-level log might have to alternate
between looking at a path's new and old name.  However, at any one
time there is only a single 'diffopt->pathspec'.

So add a step (0) to the above to ensure that the paths in the
pathspec match the paths in the line ranges associated with the
currently processed commit, and re-parse the pathspec from the paths
in the line ranges if they differ.

The new test cases include a specially crafted piece of history with
two merged branches and two files, where each branch modifies both
files, renames on of them, and then modifies both again.  Then two
separate 'git log -L' invocations check the line-level log of each of
those two files, which ensures that at least one of those invocations
have to do that back-and-forth between the file's old and new name (no
matter which branch is traversed first).  't/t4211-line-log.sh'
already contains two tests involving renames, they don't don't trigger
this back-and-forth.

Avoiding these unnecessary full tree diffs can have huge impact on
performance, especially in big repositories with big trees and mergy
history.  Tracing the evolution of a function through the whole
history:

  # git.git
  $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0

  Before:

    real    0m8.874s
    user    0m8.816s
    sys     0m0.057s

  After:

    real    0m2.516s
    user    0m2.456s
    sys     0m0.060s

  # linux.git
  $ time ~/src/git/git --no-pager log \
    -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2

  Before:

    real    3m50.033s
    user    3m48.041s
    sys     0m0.300s

  After:

    real    0m2.599s
    user    0m2.466s
    sys     0m0.157s

That's just over 88x speedup.

[1] Line-level log's rename following is quite similar to 'git log
    --follow path', with the notable differences that it does handle
    multiple paths at once as well, and that it doesn't show the
    commit performing the rename if it's an exact rename.

[2] This slowness might not have been apparent initially, because back
    when the line-level log feature was introduced rename detection
    was not yet enabled by default; 12da1d1f6f (Implement line-history
    search (git log -L), 2013-03-28) and 5404c116aa (diff: activate
    diff.renames by default, 2016-02-25).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 files changed
tree: 798626b2a9507806678078f85e94ef0416a15c99
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. perl/
  15. po/
  16. ppc/
  17. refs/
  18. sha1dc/
  19. sha256/
  20. t/
  21. templates/
  22. trace2/
  23. vcs-svn/
  24. xdiff/
  25. .clang-format
  26. .editorconfig
  27. .gitattributes
  28. .gitignore
  29. .gitmodules
  30. .mailmap
  31. .travis.yml
  32. .tsan-suppressions
  33. abspath.c
  34. aclocal.m4
  35. advice.c
  36. advice.h
  37. alias.c
  38. alias.h
  39. alloc.c
  40. alloc.h
  41. apply.c
  42. apply.h
  43. archive-tar.c
  44. archive-zip.c
  45. archive.c
  46. archive.h
  47. argv-array.c
  48. argv-array.h
  49. attr.c
  50. attr.h
  51. azure-pipelines.yml
  52. banned.h
  53. base85.c
  54. bisect.c
  55. bisect.h
  56. blame.c
  57. blame.h
  58. blob.c
  59. blob.h
  60. branch.c
  61. branch.h
  62. builtin.h
  63. bulk-checkin.c
  64. bulk-checkin.h
  65. bundle.c
  66. bundle.h
  67. cache-tree.c
  68. cache-tree.h
  69. cache.h
  70. chdir-notify.c
  71. chdir-notify.h
  72. check-builtins.sh
  73. check_bindir
  74. checkout.c
  75. checkout.h
  76. color.c
  77. color.h
  78. column.c
  79. column.h
  80. combine-diff.c
  81. command-list.txt
  82. commit-graph.c
  83. commit-graph.h
  84. commit-reach.c
  85. commit-reach.h
  86. commit-slab-decl.h
  87. commit-slab-impl.h
  88. commit-slab.h
  89. commit.c
  90. commit.h
  91. common-main.c
  92. config.c
  93. config.h
  94. config.mak.dev
  95. config.mak.in
  96. config.mak.uname
  97. configure.ac
  98. connect.c
  99. connect.h
  100. connected.c
  101. connected.h
  102. convert.c
  103. convert.h
  104. copy.c
  105. COPYING
  106. credential-cache--daemon.c
  107. credential-cache.c
  108. credential-store.c
  109. credential.c
  110. credential.h
  111. csum-file.c
  112. csum-file.h
  113. ctype.c
  114. daemon.c
  115. date.c
  116. decorate.c
  117. decorate.h
  118. delta-islands.c
  119. delta-islands.h
  120. delta.h
  121. detect-compiler
  122. diff-delta.c
  123. diff-lib.c
  124. diff-no-index.c
  125. diff.c
  126. diff.h
  127. diffcore-break.c
  128. diffcore-delta.c
  129. diffcore-order.c
  130. diffcore-pickaxe.c
  131. diffcore-rename.c
  132. diffcore.h
  133. dir-iterator.c
  134. dir-iterator.h
  135. dir.c
  136. dir.h
  137. editor.c
  138. entry.c
  139. environment.c
  140. exec-cmd.c
  141. exec-cmd.h
  142. fast-import.c
  143. fetch-negotiator.c
  144. fetch-negotiator.h
  145. fetch-object.c
  146. fetch-object.h
  147. fetch-pack.c
  148. fetch-pack.h
  149. fmt-merge-msg.h
  150. fsck.c
  151. fsck.h
  152. fsmonitor.c
  153. fsmonitor.h
  154. fuzz-commit-graph.c
  155. fuzz-pack-headers.c
  156. fuzz-pack-idx.c
  157. generate-cmdlist.sh
  158. gettext.c
  159. gettext.h
  160. git-add--interactive.perl
  161. git-archimport.perl
  162. git-bisect.sh
  163. git-compat-util.h
  164. git-cvsexportcommit.perl
  165. git-cvsimport.perl
  166. git-cvsserver.perl
  167. git-difftool--helper.sh
  168. git-filter-branch.sh
  169. git-instaweb.sh
  170. git-legacy-stash.sh
  171. git-merge-octopus.sh
  172. git-merge-one-file.sh
  173. git-merge-resolve.sh
  174. git-mergetool--lib.sh
  175. git-mergetool.sh
  176. git-p4.py
  177. git-parse-remote.sh
  178. git-quiltimport.sh
  179. git-rebase--am.sh
  180. git-rebase--common.sh
  181. git-rebase--preserve-merges.sh
  182. git-request-pull.sh
  183. git-send-email.perl
  184. git-sh-i18n.sh
  185. git-sh-setup.sh
  186. git-submodule.sh
  187. git-svn.perl
  188. GIT-VERSION-GEN
  189. git-web--browse.sh
  190. git.c
  191. git.rc
  192. gpg-interface.c
  193. gpg-interface.h
  194. graph.c
  195. graph.h
  196. grep.c
  197. grep.h
  198. hash.h
  199. hashmap.c
  200. hashmap.h
  201. help.c
  202. help.h
  203. hex.c
  204. http-backend.c
  205. http-fetch.c
  206. http-push.c
  207. http-walker.c
  208. http.c
  209. http.h
  210. ident.c
  211. imap-send.c
  212. INSTALL
  213. interdiff.c
  214. interdiff.h
  215. iterator.h
  216. json-writer.c
  217. json-writer.h
  218. khash.h
  219. kwset.c
  220. kwset.h
  221. levenshtein.c
  222. levenshtein.h
  223. LGPL-2.1
  224. line-log.c
  225. line-log.h
  226. line-range.c
  227. line-range.h
  228. linear-assignment.c
  229. linear-assignment.h
  230. list-objects-filter-options.c
  231. list-objects-filter-options.h
  232. list-objects-filter.c
  233. list-objects-filter.h
  234. list-objects.c
  235. list-objects.h
  236. list.h
  237. ll-merge.c
  238. ll-merge.h
  239. lockfile.c
  240. lockfile.h
  241. log-tree.c
  242. log-tree.h
  243. ls-refs.c
  244. ls-refs.h
  245. mailinfo.c
  246. mailinfo.h
  247. mailmap.c
  248. mailmap.h
  249. Makefile
  250. match-trees.c
  251. mem-pool.c
  252. mem-pool.h
  253. merge-blobs.c
  254. merge-blobs.h
  255. merge-recursive.c
  256. merge-recursive.h
  257. merge.c
  258. mergesort.c
  259. mergesort.h
  260. midx.c
  261. midx.h
  262. name-hash.c
  263. notes-cache.c
  264. notes-cache.h
  265. notes-merge.c
  266. notes-merge.h
  267. notes-utils.c
  268. notes-utils.h
  269. notes.c
  270. notes.h
  271. object-store.h
  272. object.c
  273. object.h
  274. oidmap.c
  275. oidmap.h
  276. oidset.c
  277. oidset.h
  278. pack-bitmap-write.c
  279. pack-bitmap.c
  280. pack-bitmap.h
  281. pack-check.c
  282. pack-objects.c
  283. pack-objects.h
  284. pack-revindex.c
  285. pack-revindex.h
  286. pack-write.c
  287. pack.h
  288. packfile.c
  289. packfile.h
  290. pager.c
  291. parse-options-cb.c
  292. parse-options.c
  293. parse-options.h
  294. patch-delta.c
  295. patch-ids.c
  296. patch-ids.h
  297. path.c
  298. path.h
  299. pathspec.c
  300. pathspec.h
  301. pkt-line.c
  302. pkt-line.h
  303. preload-index.c
  304. pretty.c
  305. pretty.h
  306. prio-queue.c
  307. prio-queue.h
  308. progress.c
  309. progress.h
  310. prompt.c
  311. prompt.h
  312. protocol.c
  313. protocol.h
  314. quote.c
  315. quote.h
  316. range-diff.c
  317. range-diff.h
  318. reachable.c
  319. reachable.h
  320. read-cache.c
  321. README.md
  322. rebase-interactive.c
  323. rebase-interactive.h
  324. ref-filter.c
  325. ref-filter.h
  326. reflog-walk.c
  327. reflog-walk.h
  328. refs.c
  329. refs.h
  330. refspec.c
  331. refspec.h
  332. remote-curl.c
  333. remote-testsvn.c
  334. remote.c
  335. remote.h
  336. replace-object.c
  337. replace-object.h
  338. repository.c
  339. repository.h
  340. rerere.c
  341. rerere.h
  342. resolve-undo.c
  343. resolve-undo.h
  344. revision.c
  345. revision.h
  346. run-command.c
  347. run-command.h
  348. send-pack.c
  349. send-pack.h
  350. sequencer.c
  351. sequencer.h
  352. serve.c
  353. serve.h
  354. server-info.c
  355. setup.c
  356. sh-i18n--envsubst.c
  357. sha1-array.c
  358. sha1-array.h
  359. sha1-file.c
  360. sha1-lookup.c
  361. sha1-lookup.h
  362. sha1-name.c
  363. sha1dc_git.c
  364. sha1dc_git.h
  365. shallow.c
  366. shell.c
  367. shortlog.h
  368. sideband.c
  369. sideband.h
  370. sigchain.c
  371. sigchain.h
  372. split-index.c
  373. split-index.h
  374. strbuf.c
  375. strbuf.h
  376. streaming.c
  377. streaming.h
  378. string-list.c
  379. string-list.h
  380. sub-process.c
  381. sub-process.h
  382. submodule-config.c
  383. submodule-config.h
  384. submodule.c
  385. submodule.h
  386. symlinks.c
  387. tag.c
  388. tag.h
  389. tar.h
  390. tempfile.c
  391. tempfile.h
  392. thread-utils.c
  393. thread-utils.h
  394. tmp-objdir.c
  395. tmp-objdir.h
  396. trace.c
  397. trace.h
  398. trace2.c
  399. trace2.h
  400. trailer.c
  401. trailer.h
  402. transport-helper.c
  403. transport-internal.h
  404. transport.c
  405. transport.h
  406. tree-diff.c
  407. tree-walk.c
  408. tree-walk.h
  409. tree.c
  410. tree.h
  411. unicode-width.h
  412. unimplemented.sh
  413. unix-socket.c
  414. unix-socket.h
  415. unpack-trees.c
  416. unpack-trees.h
  417. upload-pack.c
  418. upload-pack.h
  419. url.c
  420. url.h
  421. urlmatch.c
  422. urlmatch.h
  423. usage.c
  424. userdiff.c
  425. userdiff.h
  426. utf8.c
  427. utf8.h
  428. varint.c
  429. varint.h
  430. version.c
  431. version.h
  432. versioncmp.c
  433. walker.c
  434. walker.h
  435. wildmatch.c
  436. wildmatch.h
  437. worktree.c
  438. worktree.h
  439. wrap-for-bin.sh
  440. wrapper.c
  441. write-or-die.c
  442. ws.c
  443. wt-status.c
  444. wt-status.h
  445. xdiff-interface.c
  446. xdiff-interface.h
  447. zlib.c
README.md

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks