| GIT pack format |
| =============== |
| |
| = pack-*.pack files have the following format: |
| |
| - A header appears at the beginning and consists of the following: |
| |
| 4-byte signature: |
| The signature is: {'P', 'A', 'C', 'K'} |
| |
| 4-byte version number (network byte order): |
| GIT currently accepts version number 2 or 3 but |
| generates version 2 only. |
| |
| 4-byte number of objects contained in the pack (network byte order) |
| |
| Observation: we cannot have more than 4G versions ;-) and |
| more than 4G objects in a pack. |
| |
| - The header is followed by number of object entries, each of |
| which looks like this: |
| |
| (undeltified representation) |
| n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
| compressed data |
| |
| (deltified representation) |
| n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
| 20-byte base object name |
| compressed delta data |
| |
| Observation: length of each object is encoded in a variable |
| length format and is not constrained to 32-bit or anything. |
| |
| - The trailer records 20-byte SHA1 checksum of all of the above. |
| |
| = Original (version 1) pack-*.idx files have the following format: |
| |
| - The header consists of 256 4-byte network byte order |
| integers. N-th entry of this table records the number of |
| objects in the corresponding pack, the first byte of whose |
| object name is less than or equal to N. This is called the |
| 'first-level fan-out' table. |
| |
| - The header is followed by sorted 24-byte entries, one entry |
| per object in the pack. Each entry is: |
| |
| 4-byte network byte order integer, recording where the |
| object is stored in the packfile as the offset from the |
| beginning. |
| |
| 20-byte object name. |
| |
| - The file is concluded with a trailer: |
| |
| A copy of the 20-byte SHA1 checksum at the end of |
| corresponding packfile. |
| |
| 20-byte SHA1-checksum of all of the above. |
| |
| Pack Idx file: |
| |
| -- +--------------------------------+ |
| fanout | fanout[0] = 2 (for example) |-. |
| table +--------------------------------+ | |
| | fanout[1] | | |
| +--------------------------------+ | |
| | fanout[2] | | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| | fanout[255] = total objects |---. |
| -- +--------------------------------+ | | |
| main | offset | | | |
| index | object name 00XXXXXXXXXXXXXXXX | | | |
| table +--------------------------------+ | | |
| | offset | | | |
| | object name 00XXXXXXXXXXXXXXXX | | | |
| +--------------------------------+<+ | |
| .-| offset | | |
| | | object name 01XXXXXXXXXXXXXXXX | | |
| | +--------------------------------+ | |
| | | offset | | |
| | | object name 01XXXXXXXXXXXXXXXX | | |
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| | | offset | | |
| | | object name FFXXXXXXXXXXXXXXXX | | |
| --| +--------------------------------+<--+ |
| trailer | | packfile checksum | |
| | +--------------------------------+ |
| | | idxfile checksum | |
| | +--------------------------------+ |
| .-------. |
| | |
| Pack file entry: <+ |
| |
| packed object header: |
| 1-byte size extension bit (MSB) |
| type (next 3 bit) |
| size0 (lower 4-bit) |
| n-byte sizeN (as long as MSB is set, each 7-bit) |
| size0..sizeN form 4+7+7+..+7 bit integer, size0 |
| is the least significant part, and sizeN is the |
| most significant part. |
| packed object data: |
| If it is not DELTA, then deflated bytes (the size above |
| is the size before compression). |
| If it is DELTA, then |
| 20-byte base object name SHA1 (the size above is the |
| size of the delta data that follows). |
| delta data, deflated. |
| |
| |
| = Version 2 pack-*.idx files support packs larger than 4 GiB, and |
| have some other reorganizations. They have the format: |
| |
| - A 4-byte magic number '\377tOc' which is an unreasonable |
| fanout[0] value. |
| |
| - A 4-byte version number (= 2) |
| |
| - A 256-entry fan-out table just like v1. |
| |
| - A table of sorted 20-byte SHA1 object names. These are |
| packed together without offset values to reduce the cache |
| footprint of the binary search for a specific object name. |
| |
| - A table of 4-byte CRC32 values of the packed object data. |
| This is new in v2 so compressed data can be copied directly |
| from pack to pack during repacking without undetected |
| data corruption. |
| |
| - A table of 4-byte offset values (in network byte order). |
| These are usually 31-bit pack file offsets, but large |
| offsets are encoded as an index into the next table with |
| the msbit set. |
| |
| - A table of 8-byte offset entries (empty for pack files less |
| than 2 GiB). Pack files are organized with heavily used |
| objects toward the front, so most object references should |
| not need to refer to this table. |
| |
| - The same trailer as a v1 pack file: |
| |
| A copy of the 20-byte SHA1 checksum at the end of |
| corresponding packfile. |
| |
| 20-byte SHA1-checksum of all of the above. |