Documentation/gitpacking.txt - jrn/git - Git at Google

 gitpacking(7)
 =============

 NAME
 ----
 gitpacking - Advanced concepts related to packing in Git

 SYNOPSIS
 --------
 gitpacking

 DESCRIPTION
 -----------

 This document aims to describe some advanced concepts related to packing
 in Git.

 Many concepts are currently described scattered between manual pages of
 various Git commands, including linkgit:git-pack-objects[1],
 linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
 and parts of the `Documentation/technical` tree.

 There are many aspects of packing in Git that are not covered in this
 document that instead live in the aforementioned areas. Over time, those
 scattered bits may coalesce into this document.

 == Pseudo-merge bitmaps

 NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
 the configuration and many of the ideas are subject to change.

 === Background

 Reachability bitmaps are most efficient when we have on-disk stored
 bitmaps for one or more of the starting points of a traversal. For this
 reason, Git prefers storing bitmaps for commits at the tips of refs,
 because traversals tend to start with those points.

 But if you have a large number of refs, it's not feasible to store a
 bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
 those bitmaps together is expensive.

 One way we can deal with that is to create bitmaps that represent
 _groups_ of refs. When a traversal asks about the entire group, then we
 can use this single bitmap instead of considering each ref individually.
 Because these bitmaps represent the set of objects which would be
 reachable in a hypothetical merge of all of the commits, we call them
 pseudo-merge bitmaps.

 === Overview

 A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
 follows:

 Commit bitmap::

   A bitmap whose set bits describe the set of commits included in the
   pseudo-merge's "merge" bitmap (as below).

 Merge bitmap::

   A bitmap whose set bits describe the reachability closure over the set
   of commits in the pseudo-merge's "commits" bitmap (as above). An
   identical bitmap would be generated for an octopus merge with the same
   set of parents as described in the commits bitmap.

 Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
 for a given pseudo-merge are listed on either side of the traversal,
 either directly (by explicitly asking for them as part of the `HAVES`
 or `WANTS`) or indirectly (by encountering them during a fill-in
 traversal).

 === Use-cases

 For example, suppose there exists a pseudo-merge bitmap with a large
 number of commits, all of which are listed in the `WANTS` section of
 some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
 bitmap machinery can quickly determine there is a pseudo-merge which
 satisfies some subset of the wanted objects on either side of the query.
 Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
 resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
 have to repeat the decompression and `OR`-ing step over a potentially
 large number of individual bitmaps, which can take proportionally more
 time.

 Another benefit of pseudo-merges arises when there is some combination
 of (a) a large number of references, with (b) poor bitmap coverage, and
 (c) deep, nested trees, making fill-in traversal relatively expensive.
 For example, suppose that there are a large enough number of tags where
 bitmapping each of the tags individually is infeasible. Without
 pseudo-merge bitmaps, computing the result of, say, `git rev-list
 --use-bitmap-index --count --objects --tags` would likely require a
 large amount of fill-in traversal. But when a large quantity of those
 tags are stored together in a pseudo-merge bitmap, the bitmap machinery
 can take advantage of the fact that we only care about the union of
 objects reachable from all of those tags, and answer the query much
 faster.

 === Configuration

 Reference tips are grouped into different pseudo-merge groups according
 to two criteria. A reference name matches one or more of the defined
 pseudo-merge patterns, and optionally one or more capture groups within
 that pattern which further partition the group.

 Within a group, commits may be considered "stable", or "unstable"
 depending on their age. These are adjusted by setting the
 `bitmapPseudoMerge.<name>.stableThreshold` and
 `bitmapPseudoMerge.<name>.threshold` configuration values, respectively.

 All stable commits are grouped into pseudo-merges of equal size
 (`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
 configuration is set to, say, 100, then the first 100 commits (ordered
 by committer date) which are older than the `stableThreshold` value will
 form one group, the next 100 commits will form another group, and so on.

 Among unstable commits, the pseudo-merge machinery will attempt to
 combine older commits into large groups as opposed to newer commits
 which will appear in smaller groups. This is based on the heuristic that
 references whose tip commit is older are less likely to be modified to
 point at a different commit than a reference whose tip commit is newer.

 The size of groups is determined by a power-law decay function, and the
 decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
 where `f(n)` describes the size of the `n`-th pseudo-merge group. The
 sample rate controls what percentage of eligible commits are considered
 as candidates. The threshold parameter indicates the minimum age (so as
 to avoid including too-recent commits in a pseudo-merge group, making it
 less likely to be valid). The "maxMerges" parameter sets an upper-bound
 on the number of pseudo-merge commits an individual group

 The "stable"-related parameters control "stable" pseudo-merge groups,
 comprised of a fixed number of commits which are older than the
 configured "stable threshold" value and may be grouped together in
 chunks of "stableSize" in order of age.

 The exact configuration for pseudo-merges is as follows:

 include::config/bitmap-pseudo-merge.txt[]

 === Examples

 Suppose that you have a repository with a large number of references,
 and you want a bare-bones configuration of pseudo-merge bitmaps that
 will enhance bitmap coverage of the `refs/` namespace. You may start
 with a configuration like so:

 ----
 [bitmapPseudoMerge "all"]
 	pattern = "refs/"
 	threshold = now
 	stableThreshold = never
 	sampleRate = 100
 	maxMerges = 64
 ----

 This will create pseudo-merge bitmaps for all references, regardless of
 their age, and group them into 64 pseudo-merge commits.

 If you wanted to separate tags from branches when generating
 pseudo-merge commits, you would instead define the pattern with a
 capture group, like so:

 ----
 [bitmapPseudoMerge "all"]
 	pattern = "refs/(heads/tags)/"
 ----

 Suppose instead that you are working in a fork-network repository, with
 each fork specified by some numeric ID, and whose refs reside in
 `refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
 fork) in the network. In this instance, you may instead write something
 like:

 ----
 [bitmapPseudoMerge "all"]
 	pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
 	threshold = now
 	stableThreshold = never
 	sampleRate = 100
 	maxMerges = 64
 ----

 Which would generate pseudo-merge group identifiers like "1234-heads",
 and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
 respectively).

 SEE ALSO
 --------
 linkgit:git-pack-objects[1]
 linkgit:git-repack[1]

 GIT
 ---
 Part of the linkgit:git[1] suite
	gitpacking(7)
	=============

	NAME
	----
	gitpacking - Advanced concepts related to packing in Git

	SYNOPSIS
	--------
	gitpacking

	DESCRIPTION
	-----------

	This document aims to describe some advanced concepts related to packing
	in Git.

	Many concepts are currently described scattered between manual pages of
	various Git commands, including linkgit:git-pack-objects[1],
	linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
	and parts of the `Documentation/technical` tree.

	There are many aspects of packing in Git that are not covered in this
	document that instead live in the aforementioned areas. Over time, those
	scattered bits may coalesce into this document.

	== Pseudo-merge bitmaps

	NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
	the configuration and many of the ideas are subject to change.

	=== Background

	Reachability bitmaps are most efficient when we have on-disk stored
	bitmaps for one or more of the starting points of a traversal. For this
	reason, Git prefers storing bitmaps for commits at the tips of refs,
	because traversals tend to start with those points.

	But if you have a large number of refs, it's not feasible to store a
	bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
	those bitmaps together is expensive.

	One way we can deal with that is to create bitmaps that represent
	_groups_ of refs. When a traversal asks about the entire group, then we
	can use this single bitmap instead of considering each ref individually.
	Because these bitmaps represent the set of objects which would be
	reachable in a hypothetical merge of all of the commits, we call them
	pseudo-merge bitmaps.

	=== Overview

	A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
	follows:

	Commit bitmap::

	A bitmap whose set bits describe the set of commits included in the
	pseudo-merge's "merge" bitmap (as below).

	Merge bitmap::

	A bitmap whose set bits describe the reachability closure over the set
	of commits in the pseudo-merge's "commits" bitmap (as above). An
	identical bitmap would be generated for an octopus merge with the same
	set of parents as described in the commits bitmap.

	Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
	for a given pseudo-merge are listed on either side of the traversal,
	either directly (by explicitly asking for them as part of the `HAVES`
	or `WANTS`) or indirectly (by encountering them during a fill-in
	traversal).

	=== Use-cases

	For example, suppose there exists a pseudo-merge bitmap with a large
	number of commits, all of which are listed in the `WANTS` section of
	some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
	bitmap machinery can quickly determine there is a pseudo-merge which
	satisfies some subset of the wanted objects on either side of the query.
	Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
	resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
	have to repeat the decompression and `OR`-ing step over a potentially
	large number of individual bitmaps, which can take proportionally more
	time.

	Another benefit of pseudo-merges arises when there is some combination
	of (a) a large number of references, with (b) poor bitmap coverage, and
	(c) deep, nested trees, making fill-in traversal relatively expensive.
	For example, suppose that there are a large enough number of tags where
	bitmapping each of the tags individually is infeasible. Without
	pseudo-merge bitmaps, computing the result of, say, `git rev-list
	--use-bitmap-index --count --objects --tags` would likely require a
	large amount of fill-in traversal. But when a large quantity of those
	tags are stored together in a pseudo-merge bitmap, the bitmap machinery
	can take advantage of the fact that we only care about the union of
	objects reachable from all of those tags, and answer the query much
	faster.

	=== Configuration

	Reference tips are grouped into different pseudo-merge groups according
	to two criteria. A reference name matches one or more of the defined
	pseudo-merge patterns, and optionally one or more capture groups within
	that pattern which further partition the group.

	Within a group, commits may be considered "stable", or "unstable"
	depending on their age. These are adjusted by setting the
	`bitmapPseudoMerge.<name>.stableThreshold` and
	`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.

	All stable commits are grouped into pseudo-merges of equal size
	(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
	configuration is set to, say, 100, then the first 100 commits (ordered
	by committer date) which are older than the `stableThreshold` value will
	form one group, the next 100 commits will form another group, and so on.

	Among unstable commits, the pseudo-merge machinery will attempt to
	combine older commits into large groups as opposed to newer commits
	which will appear in smaller groups. This is based on the heuristic that
	references whose tip commit is older are less likely to be modified to
	point at a different commit than a reference whose tip commit is newer.

	The size of groups is determined by a power-law decay function, and the
	decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
	where `f(n)` describes the size of the `n`-th pseudo-merge group. The
	sample rate controls what percentage of eligible commits are considered
	as candidates. The threshold parameter indicates the minimum age (so as
	to avoid including too-recent commits in a pseudo-merge group, making it
	less likely to be valid). The "maxMerges" parameter sets an upper-bound
	on the number of pseudo-merge commits an individual group

	The "stable"-related parameters control "stable" pseudo-merge groups,
	comprised of a fixed number of commits which are older than the
	configured "stable threshold" value and may be grouped together in
	chunks of "stableSize" in order of age.

	The exact configuration for pseudo-merges is as follows:

	include::config/bitmap-pseudo-merge.txt[]

	=== Examples

	Suppose that you have a repository with a large number of references,
	and you want a bare-bones configuration of pseudo-merge bitmaps that
	will enhance bitmap coverage of the `refs/` namespace. You may start
	with a configuration like so:

	----
	[bitmapPseudoMerge "all"]
	pattern = "refs/"
	threshold = now
	stableThreshold = never
	sampleRate = 100
	maxMerges = 64
	----

	This will create pseudo-merge bitmaps for all references, regardless of
	their age, and group them into 64 pseudo-merge commits.

	If you wanted to separate tags from branches when generating
	pseudo-merge commits, you would instead define the pattern with a
	capture group, like so:

	----
	[bitmapPseudoMerge "all"]
	pattern = "refs/(heads/tags)/"
	----

	Suppose instead that you are working in a fork-network repository, with
	each fork specified by some numeric ID, and whose refs reside in
	`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
	fork) in the network. In this instance, you may instead write something
	like:

	----
	[bitmapPseudoMerge "all"]
	pattern = "refs/virtual/([0-9]+)/(heads\|tags)/"
	threshold = now
	stableThreshold = never
	sampleRate = 100
	maxMerges = 64
	----

	Which would generate pseudo-merge group identifiers like "1234-heads",
	and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
	respectively).

	SEE ALSO
	--------
	linkgit:git-pack-objects[1]
	linkgit:git-repack[1]

	GIT
	---
	Part of the linkgit:git[1] suite