Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 1 | Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST) |
| 2 | From: Linus Torvalds <torvalds@linux-foundation.org> |
| 3 | Subject: corrupt object on git-gc |
| 4 | Abstract: Some tricks to reconstruct blob objects in order to fix |
| 5 | a corrupted repository. |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 6 | Content-type: text/asciidoc |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 7 | |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 8 | How to recover a corrupted blob object |
| 9 | ====================================== |
| 10 | |
| 11 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 12 | On Fri, 9 Nov 2007, Yossi Leybovich wrote: |
| 13 | > |
| 14 | > Did not help still the repository look for this object? |
| 15 | > Any one know how can I track this object and understand which file is it |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 16 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 17 | |
Thomas Ackermann | d5fa1f1 | 2013-04-15 19:49:04 +0200 | [diff] [blame] | 18 | So exactly *because* the SHA-1 hash is cryptographically secure, the hash |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 19 | itself doesn't actually tell you anything, in order to fix a corrupt |
| 20 | object you basically have to find the "original source" for it. |
| 21 | |
| 22 | The easiest way to do that is almost always to have backups, and find the |
Thomas Ackermann | 2de9b71 | 2013-01-21 20:17:53 +0100 | [diff] [blame] | 23 | same object somewhere else. Backups really are a good idea, and Git makes |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 24 | it pretty easy (if nothing else, just clone the repository somewhere else, |
| 25 | and make sure that you do *not* use a hard-linked clone, and preferably |
| 26 | not the same disk/machine). |
| 27 | |
| 28 | But since you don't seem to have backups right now, the good news is that |
| 29 | especially with a single blob being corrupt, these things *are* somewhat |
| 30 | debuggable. |
| 31 | |
| 32 | First off, move the corrupt object away, and *save* it. The most common |
| 33 | cause of corruption so far has been memory corruption, but even so, there |
| 34 | are people who would be interested in seeing the corruption - but it's |
| 35 | basically impossible to judge the corruption until we can also see the |
| 36 | original object, so right now the corrupt object is useless, but it's very |
| 37 | interesting for the future, in the hope that you can re-create a |
| 38 | non-corrupt version. |
| 39 | |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 40 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 41 | So: |
| 42 | |
| 43 | > ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../ |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 44 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 45 | |
| 46 | This is the right thing to do, although it's usually best to save it under |
Thomas Ackermann | d5fa1f1 | 2013-04-15 19:49:04 +0200 | [diff] [blame] | 47 | it's full SHA-1 name (you just dropped the "4b" from the result ;). |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 48 | |
| 49 | Let's see what that tells us: |
| 50 | |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 51 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 52 | > ib]$ git-fsck --full |
| 53 | > broken link from tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 |
| 54 | > to blob 4b9458b3786228369c63936db65827de3cc06200 |
| 55 | > missing blob 4b9458b3786228369c63936db65827de3cc06200 |
Thomas Ackermann | 1797e5c | 2012-10-16 19:25:29 +0200 | [diff] [blame] | 56 | ----------------------------------------------------------- |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 57 | |
| 58 | Ok, I removed the "dangling commit" messages, because they are just |
| 59 | messages about the fact that you probably have rebased etc, so they're not |
| 60 | at all interesting. But what remains is still very useful. In particular, |
| 61 | we now know which tree points to it! |
| 62 | |
| 63 | Now you can do |
| 64 | |
| 65 | git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 |
| 66 | |
| 67 | which will show something like |
| 68 | |
| 69 | 100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8 .gitignore |
| 70 | 100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883 .mailmap |
| 71 | 100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING |
| 72 | 100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453 CREDITS |
| 73 | 040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6 Documentation |
| 74 | 100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32 Kbuild |
| 75 | 100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9 MAINTAINERS |
| 76 | ... |
| 77 | |
| 78 | and you should now have a line that looks like |
| 79 | |
| 80 | 10064 blob 4b9458b3786228369c63936db65827de3cc06200 my-magic-file |
| 81 | |
| 82 | in the output. This already tells you a *lot* it tells you what file the |
| 83 | corrupt blob came from! |
| 84 | |
| 85 | Now, it doesn't tell you quite enough, though: it doesn't tell what |
| 86 | *version* of the file didn't get correctly written! You might be really |
| 87 | lucky, and it may be the version that you already have checked out in your |
| 88 | working tree, in which case fixing this problem is really simple, just do |
| 89 | |
| 90 | git hash-object -w my-magic-file |
| 91 | |
Thomas Ackermann | d5fa1f1 | 2013-04-15 19:49:04 +0200 | [diff] [blame] | 92 | again, and if it outputs the missing SHA-1 (4b945..) you're now all done! |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 93 | |
| 94 | But that's the really lucky case, so let's assume that it was some older |
| 95 | version that was broken. How do you tell which version it was? |
| 96 | |
| 97 | The easiest way to do it is to do |
| 98 | |
| 99 | git log --raw --all --full-history -- subdirectory/my-magic-file |
| 100 | |
| 101 | and that will show you the whole log for that file (please realize that |
| 102 | the tree you had may not be the top-level tree, so you need to figure out |
| 103 | which subdirectory it was in on your own), and because you're asking for |
| 104 | raw output, you'll now get something like |
| 105 | |
| 106 | commit abc |
| 107 | Author: |
| 108 | Date: |
| 109 | .. |
| 110 | :100644 100644 4b9458b... newsha... M somedirectory/my-magic-file |
| 111 | |
| 112 | |
| 113 | commit xyz |
| 114 | Author: |
| 115 | Date: |
| 116 | |
| 117 | .. |
| 118 | :100644 100644 oldsha... 4b9458b... M somedirectory/my-magic-file |
| 119 | |
| 120 | and this actually tells you what the *previous* and *subsequent* versions |
| 121 | of that file were! So now you can look at those ("oldsha" and "newsha" |
| 122 | respectively), and hopefully you have done commits often, and can |
| 123 | re-create the missing my-magic-file version by looking at those older and |
| 124 | newer versions! |
| 125 | |
| 126 | If you can do that, you can now recreate the missing object with |
| 127 | |
| 128 | git hash-object -w <recreated-file> |
| 129 | |
| 130 | and your repository is good again! |
| 131 | |
| 132 | (Btw, you could have ignored the fsck, and started with doing a |
| 133 | |
| 134 | git log --raw --all |
| 135 | |
| 136 | and just looked for the sha of the missing object (4b9458b..) in that |
Thomas Ackermann | 2de9b71 | 2013-01-21 20:17:53 +0100 | [diff] [blame] | 137 | whole thing. It's up to you - Git does *have* a lot of information, it is |
Nicolas Pitre | 2729cad | 2007-11-09 12:28:19 -0500 | [diff] [blame] | 138 | just missing one particular blob version. |
| 139 | |
| 140 | Trying to recreate trees and especially commits is *much* harder. So you |
| 141 | were lucky that it's a blob. It's quite possible that you can recreate the |
| 142 | thing. |
| 143 | |
| 144 | Linus |