commit | 4169d8964523198ca89f507824c07b70ba833732 | [log] [tgz] |
---|---|---|
author | Jeff King <peff@peff.net> | Mon Oct 09 17:04:58 2023 -0400 |
committer | Junio C Hamano <gitster@pobox.com> | Mon Oct 09 15:55:00 2023 -0700 |
tree | 115dfdae9ff93b3510c5e13c680528ffabb7b5b5 | |
parent | fc926567ede1f82799ef49dd54aa37b4497a5453 [diff] |
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt
for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial
or git help tutorial
, and the documentation of each command with man git-<commandname>
or git help <commandname>
.
CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration
or git help cvs-migration
if git is installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po
file is a Portable Object file that holds the translations).
To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org (not the Git list). The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):