| Large Object Promisors |
| ====================== |
| |
| Since Git has been created, users have been complaining about issues |
| with storing large files in Git. Some solutions have been created to |
| help, but they haven't helped much with some issues. |
| |
| Git currently supports multiple promisor remotes, which could help |
| with some of these remaining issues, but it's very hard to use them to |
| help, because a number of important features are missing. |
| |
| The goal of the effort described in this document is to add these |
| important features. |
| |
| We will call a "Large Object Promisor", or "LOP" in short, a promisor |
| remote which is used to store only large blobs and which is separate |
| from the main remote that should store the other Git objects and the |
| rest of the repos. |
| |
| By extension, we will also call "Large Object Promisor", or LOP, the |
| effort described in this document to add a set of features to make it |
| easier to handle large blobs/files in Git by using LOPs. |
| |
| This effort aims to especially improve things on the server side, and |
| especially for large blobs that are already compressed in a binary |
| format. |
| |
| This effort aims to provide an alternative to Git LFS |
| (https://git-lfs.com/) and similar tools like git-annex |
| (https://git-annex.branchable.com/) for handling large files, even |
| though a complete alternative would very likely require other efforts |
| especially on the client side, where it would likely help to implement |
| a new object representation for large blobs as discussed in: |
| |
| https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ |
| |
| 0) Non goals |
| ------------ |
| |
| - We will not discuss those client side improvements here, as they |
| would require changes in different parts of Git than this effort. |
| + |
| So we don't pretend to fully replace Git LFS with only this effort, |
| but we nevertheless believe that it can significantly improve the |
| current situation on the server side, and that other separate |
| efforts could also improve the situation on the client side. |
| |
| - In the same way, we are not going to discuss all the possible ways |
| to implement a LOP or their underlying object storage, or to |
| optimize how LOP works. |
| + |
| Our opinion is that the simplest solution for now is for LOPs to use |
| object storage through a remote helper (see section II.2 below for |
| more details) to store their objects. So we consider that this is the |
| default implementation. If there are improvements on top of this, |
| that's great, but our opinion is that such improvements are not |
| necessary for LOPs to already be useful. Such improvements are likely |
| a different technical topic, and can be taken care of separately |
| anyway. |
| + |
| So in particular we are not going to discuss pluggable ODBs or other |
| object database backends that could chunk large blobs, dedup the |
| chunks and store them efficiently. Sure, that would be a nice |
| improvement to store large blobs on the server side, but we believe |
| it can just be a separate effort as it's also not technically very |
| related to this effort. |
| + |
| We are also not going to discuss data transfer improvements between |
| LOPs and clients or servers. Sure, there might be some easy and very |
| effective optimizations there (as we know that objects on LOPs are |
| very likely incompressible and not deltifying well), but this can be |
| dealt with separately in a separate effort. |
| |
| In other words, the goal of this document is not to talk about all the |
| possible ways to optimize how Git could handle large blobs, but to |
| describe how a LOP based solution can already work well and alleviate |
| a number of current issues in the context of Git clients and servers |
| sharing Git objects. |
| |
| Even if LOPs are used not very efficiently, they can still be useful |
| and worth using in some cases, as we will see in more details |
| later in this document: |
| |
| - they can make it simpler for clients to use promisor remotes and |
| therefore avoid fetching a lot of large blobs they might not need |
| locally, |
| |
| - they can make it significantly cheaper or easier for servers to |
| host a significant part of the current repository content, and |
| even more to host content with larger blobs or more large blobs |
| than currently. |
| |
| I) Issues with the current situation |
| ------------------------------------ |
| |
| - Some statistics made on GitLab repos have shown that more than 75% |
| of the disk space is used by blobs that are larger than 1MB and |
| often in a binary format. |
| |
| - So even if users could use Git LFS or similar tools to store a lot |
| of large blobs out of their repos, it's a fact that in practice they |
| don't do it as much as they probably should. |
| |
| - On the server side ideally, the server should be able to decide for |
| itself how it stores things. It should not depend on users deciding |
| to use tools like Git LFS on some blobs or not. |
| |
| - It's much more expensive to store large blobs that don't delta |
| compress well on regular fast seeking drives (like SSDs) than on |
| object storage (like Amazon S3 or GCP Buckets). Using fast drives |
| for regular Git repos makes sense though, as serving regular Git |
| content (blobs containing text or code) needs drives where seeking |
| is fast, but the content is relatively small. On the other hand, |
| object storage for Git LFS blobs makes sense as seeking speed is not |
| as important when dealing with large files, while costs are more |
| important. So the fact that users don't use Git LFS or similar tools |
| for a significant number of large blobs has likely some bad |
| consequences on the cost of repo storage for most Git hosting |
| platforms. |
| |
| - Having large blobs handled in the same way as other blobs and Git |
| objects in Git repos instead of on object storage also has a cost in |
| increased memory and CPU usage, and therefore decreased performance, |
| when creating packfiles. (This is because Git tries to use delta |
| compression or zlib compression which is unlikely to work well on |
| already compressed binary content.) So it's not just a storage cost |
| increase. |
| |
| - When a large blob has been committed into a repo, it might not be |
| possible to remove this blob from the repo without rewriting |
| history, even if the user then decides to use Git LFS or a similar |
| tool to handle it. |
| |
| - In fact Git LFS and similar tools are not very flexible in letting |
| users change their minds about the blobs they should handle or not. |
| |
| - Even when users are using Git LFS or similar tools, they are often |
| complaining that these tools require significant effort to set up, |
| learn and use correctly. |
| |
| II) Main features of the "Large Object Promisors" solution |
| ---------------------------------------------------------- |
| |
| The main features below should give a rough overview of how the |
| solution may work. Details about needed elements can be found in |
| following sections. |
| |
| Even if each feature below is very useful for the full solution, it is |
| very likely to be also useful on its own in some cases where the full |
| solution is not required. However, we'll focus primarily on the big |
| picture here. |
| |
| Also each feature doesn't need to be implemented entirely in Git |
| itself. Some could be scripts, hooks or helpers that are not part of |
| the Git repo. It would be helpful if those could be shared and |
| improved on collaboratively though. So we want to encourage sharing |
| them. |
| |
| 1) Large blobs are stored on LOPs |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Large blobs should be stored on special promisor remotes that we will |
| call "Large Object Promisors" or LOPs. These LOPs should be additional |
| remotes dedicated to contain large blobs especially those in binary |
| format. They should be used along with main remotes that contain the |
| other objects. |
| |
| Note 1 |
| ++++++ |
| |
| To clarify, a LOP is a normal promisor remote, except that: |
| |
| - it should store only large blobs, |
| |
| - it should be separate from the main remote, so that the main remote |
| can focus on serving other objects and the rest of the repos (see |
| feature 4) below) and can use the LOP as a promisor remote for |
| itself. |
| |
| Note 2 |
| ++++++ |
| |
| Git already makes it possible for a main remote to also be a promisor |
| remote storing both regular objects and large blobs for a client that |
| clones from it with a filter on blob size. But here we explicitly want |
| to avoid that. |
| |
| Rationale |
| +++++++++ |
| |
| LOPs aim to be good at handling large blobs while main remotes are |
| already good at handling other objects. |
| |
| Implementation |
| ++++++++++++++ |
| |
| Git already has support for multiple promisor remotes, see |
| link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. |
| |
| Also, Git already has support for partial clone using a filter on the |
| size of the blobs (with `git clone --filter=blob:limit=<size>`). Most |
| of the other main features below are based on these existing features |
| and are about making them easy and efficient to use for the purpose of |
| better handling large blobs. |
| |
| 2) LOPs can use object storage |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| LOPs can be implemented using object storage, like an Amazon S3 or GCP |
| Bucket or MinIO (which is open source under the GNU AGPLv3 license) to |
| actually store the large blobs, and can be accessed through a Git |
| remote helper (see linkgit:gitremote-helpers[7]) which makes the |
| underlying object storage appear like a remote to Git. |
| |
| Note |
| ++++ |
| |
| A LOP can be a promisor remote accessed using a remote helper by |
| both some clients and the main remote. |
| |
| Rationale |
| +++++++++ |
| |
| This looks like the simplest way to create LOPs that can cheaply |
| handle many large blobs. |
| |
| Implementation |
| ++++++++++++++ |
| |
| Remote helpers are quite easy to write as shell scripts, but it might |
| be more efficient and maintainable to write them using other languages |
| like Go. |
| |
| Some already exist under open source licenses, for example: |
| |
| - https://github.com/awslabs/git-remote-s3 |
| - https://gitlab.com/eric.p.ju/git-remote-gs |
| |
| Other ways to implement LOPs are certainly possible, but the goal of |
| this document is not to discuss how to best implement a LOP or its |
| underlying object storage (see the "0) Non goals" section above). |
| |
| 3) LOP object storage can be Git LFS storage |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The underlying object storage that a LOP uses could also serve as |
| storage for large files handled by Git LFS. |
| |
| Rationale |
| +++++++++ |
| |
| This would simplify the server side if it wants to both use a LOP and |
| act as a Git LFS server. |
| |
| 4) A main remote can offload to a LOP with a configurable threshold |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| On the server side, a main remote should have a way to offload to a |
| LOP all its blobs with a size over a configurable threshold. |
| |
| Rationale |
| +++++++++ |
| |
| This makes it easy to set things up and to clean things up. For |
| example, an admin could use this to manually convert a repo not using |
| LOPs to a repo using a LOP. On a repo already using a LOP but where |
| some users would sometimes push large blobs, a cron job could use this |
| to regularly make sure the large blobs are moved to the LOP. |
| |
| Implementation |
| ++++++++++++++ |
| |
| Using something based on `git repack --filter=...` to separate the |
| blobs we want to offload from the other Git objects could be a good |
| idea. The missing part is to connect to the LOP, check if the blobs we |
| want to offload are already there and if not send them. |
| |
| 5) A main remote should try to remain clean from large blobs |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| A main remote should try to avoid containing a lot of oversize |
| blobs. For that purpose, it should offload as needed to a LOP and it |
| should have ways to prevent oversize blobs to be fetched, and also |
| perhaps pushed, into it. |
| |
| Rationale |
| +++++++++ |
| |
| A main remote containing many oversize blobs would defeat the purpose |
| of LOPs. |
| |
| Implementation |
| ++++++++++++++ |
| |
| The way to offload to a LOP discussed in 4) above can be used to |
| regularly offload oversize blobs. About preventing oversize blobs from |
| being fetched into the repo see 6) below. About preventing oversize |
| blob pushes, a pre-receive hook could be used. |
| |
| Also there are different scenarios in which large blobs could get |
| fetched into the main remote, for example: |
| |
| - A client that doesn't implement the "promisor-remote" protocol |
| (described in 6) below) clones from the main remote. |
| |
| - The main remote gets a request for information about a large blob |
| and is not able to get that information without fetching the blob |
| from the LOP. |
| |
| It might not be possible to completely prevent all these scenarios |
| from happening. So the goal here should be to implement features that |
| make the fetching of large blobs less likely. For example adding a |
| `remote-object-info` command in the `git cat-file --batch` protocol |
| and its variants might make it possible for a main repo to respond to |
| some requests about large blobs without fetching them. |
| |
| 6) A protocol negotiation should happen when a client clones |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| When a client clones from a main repo, there should be a protocol |
| negotiation so that the server can advertise one or more LOPs and so |
| that the client and the server can discuss if the client could |
| directly use a LOP the server is advertising. If the client and the |
| server can agree on that, then the client would be able to get the |
| large blobs directly from the LOP and the server would not need to |
| fetch those blobs from the LOP to be able to serve the client. |
| |
| Note |
| ++++ |
| |
| For fetches instead of clones, a protocol negotiation might not always |
| happen, see the "What about fetches?" FAQ entry below for details. |
| |
| Rationale |
| +++++++++ |
| |
| Security, configurability and efficiency of setting things up. |
| |
| Implementation |
| ++++++++++++++ |
| |
| A "promisor-remote" protocol v2 capability looks like a good way to |
| implement this. The way the client and server use this capability |
| could be controlled by configuration variables. |
| |
| Information that the server could send to the client through that |
| protocol could be things like: LOP name, LOP URL, filter-spec (for |
| example `blob:limit=<size>`) or just size limit that should be used as |
| a filter when cloning, token to be used with the LOP, etc. |
| |
| 7) A client can offload to a LOP |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| When a client is using a LOP that is also a LOP of its main remote, |
| the client should be able to offload some large blobs it has fetched, |
| but might not need anymore, to the LOP. |
| |
| Note |
| ++++ |
| |
| It might depend on the context if it should be OK or not for clients |
| to offload large blobs they have created, instead of fetched, directly |
| to the LOP without the main remote checking them in some ways |
| (possibly using hooks or other tools). |
| |
| This should be discussed and refined when we get closer to |
| implementing this feature. |
| |
| Rationale |
| +++++++++ |
| |
| On the client, the easiest way to deal with unneeded large blobs is to |
| offload them. |
| |
| Implementation |
| ++++++++++++++ |
| |
| This is very similar to what 4) above is about, except on the client |
| side instead of the server side. So a good solution to 4) could likely |
| be adapted to work on the client side too. |
| |
| There might be some security issues here, as there is no negotiation, |
| but they might be mitigated if the client can reuse a token it got |
| when cloning (see 6) above). Also if the large blobs were fetched from |
| a LOP, it is likely, and can easily be confirmed, that the LOP still |
| has them, so that they can just be removed from the client. |
| |
| III) Benefits of using LOPs |
| --------------------------- |
| |
| Many benefits are related to the issues discussed in "I) Issues with |
| the current situation" above: |
| |
| - No need to rewrite history when deciding which blobs are worth |
| handling separately than other objects, or when moving or removing |
| the threshold. |
| |
| - If the protocol between client and server is developed and secured |
| enough, then many details might be setup on the server side only and |
| all the clients could then easily get all the configuration |
| information and use it to set themselves up mostly automatically. |
| |
| - Storage costs benefits on the server side. |
| |
| - Reduced memory and CPU needs on main remotes on the server side. |
| |
| - Reduced storage needs on the client side. |
| |
| IV) FAQ |
| ------- |
| |
| What about using multiple LOPs on the server and client side? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| That could perhaps be useful in some cases, but for now it's more |
| likely that in most cases a single LOP will be advertised by the |
| server and should be used by the client. |
| |
| A case where it could be useful for a server to advertise multiple |
| LOPs is if a LOP is better for some users while a different LOP is |
| better for other users. For example some clients might have a better |
| connection to a LOP than others. |
| |
| In those cases it's the responsibility of the server to have some |
| documentation to help clients. It could say for example something like |
| "Users in this part of the world might want to pick only LOP A as it |
| is likely to be better connected to them, while users in other parts |
| of the world should pick only LOP B for the same reason." |
| |
| When should we trust or not trust the LOPs advertised by the server? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| In some contexts, like in corporate setup where the server and all the |
| clients are parts of an internal network in a company where admins |
| have all the rights on every system, it's OK, and perhaps even a good |
| thing, if the clients fully trust the server, as it can help ensure |
| that all the clients are on the same page. |
| |
| There are also contexts in which clients trust a code hosting platform |
| serving them some repos, but might not fully trust other users |
| managing or contributing to some of these repos. For example, the code |
| hosting platform could have hooks in place to check that any object it |
| receives doesn't contain malware or otherwise bad content. In this |
| case it might be OK for the client to use a main remote and its LOP if |
| they are both hosted by the code hosting platform, but not if the LOP |
| is hosted elsewhere (where the content is not checked). |
| |
| In other contexts, a client should just not trust a server. |
| |
| So there should be different ways to configure how the client should |
| behave when a server advertises a LOP to it at clone time. |
| |
| As the basic elements that a server can advertise about a LOP are a |
| LOP name and a LOP URL, the client should base its decision about |
| accepting a LOP on these elements. |
| |
| One simple way to be very strict in the LOP it accepts is for example |
| for the client to check that the LOP is already configured on the |
| client with the same name and URL as what the server advertises. |
| |
| In general default and "safe" settings should require that the LOP are |
| configured on the client separately from the "promisor-remote" |
| protocol and that the client accepts a LOP only when information about |
| it from the protocol matches what has been already configured |
| separately. |
| |
| What about LOP names? |
| ~~~~~~~~~~~~~~~~~~~~~ |
| |
| In some contexts, for example if the clients sometimes fetch from each |
| other, it can be a good idea for all the clients to use the same names |
| for all the remotes they use, including LOPs. |
| |
| In other contexts, each client might want to be able to give the name |
| it wants to each remote, including each LOP, it interacts with. |
| |
| So there should be different ways to configure how the client accepts |
| or not the LOP name the server advertises. |
| |
| If a default or "safe" setting is used, then as such a setting should |
| require that the LOP be configured separately, then the name would be |
| configured separately and there is no risk that the server could |
| dictate a name to a client. |
| |
| Could the main remote be bogged down by old or paranoid clients? |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Yes, it could happen if there are too many clients that are either |
| unwilling to trust the main remote or that just don't implement the |
| "promisor-remote" protocol because they are too old or not fully |
| compatible with the 'git' client. |
| |
| When serving such a client, the main remote has no other choice than |
| to first fetch from its LOP, to then be able to provide to the client |
| everything it requested. So the main remote, even if it has cleanup |
| mechanisms (see section II.4 above), would be burdened at least |
| temporarily with the large blobs it had to fetch from its LOP. |
| |
| Not behaving like this would be breaking backward compatibility, and |
| could be seen as segregating clients. For example, it might be |
| possible to implement a special mode that allows the server to just |
| reject clients that don't implement the "promisor-remote" protocol or |
| aren't willing to trust the main remote. This mode might be useful in |
| a special context like a corporate environment. There is no plan to |
| implement such a mode though, and this should be discussed separately |
| later anyway. |
| |
| A better way to proceed is probably for the main remote to show a |
| message telling clients that don't implement the protocol or are |
| unwilling to accept the advertised LOP(s) that they would get faster |
| clone and fetches by upgrading client software or properly setting |
| them up to accept LOP(s). |
| |
| Waiting for clients to upgrade, monitoring these upgrades and limiting |
| the use of LOPs to repos that are not very frequently accessed might |
| be other good ways to make sure that some benefits are still reaped |
| from LOPs. Over time, as more and more clients upgrade and benefit |
| from LOPs, using them in more and more frequently accessed repos will |
| become worth it. |
| |
| Corporate environments, where it might be easier to make sure that all |
| the clients are up-to-date and properly configured, could hopefully |
| benefit more and earlier from using LOPs. |
| |
| What about fetches? |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| There are different kinds of fetches. A regular fetch happens when |
| some refs have been updated on the server and the client wants the ref |
| updates and possibly the new objects added with them. A "backfill" or |
| "lazy" fetch, on the contrary, happens when the client needs to use |
| some objects it already knows about but doesn't have because they are |
| on a promisor remote. |
| |
| Regular fetch |
| +++++++++++++ |
| |
| In a regular fetch, the client will contact the main remote and a |
| protocol negotiation will happen between them. It's a good thing that |
| a protocol negotiation happens every time, as the configuration on the |
| client or the main remote could have changed since the previous |
| protocol negotiation. In this case, the new protocol negotiation |
| should ensure that the new fetch will happen in a way that satisfies |
| the new configuration of both the client and the server. |
| |
| In most cases though, the configurations on the client and the main |
| remote will not have changed between 2 fetches or between the initial |
| clone and a subsequent fetch. This means that the result of a new |
| protocol negotiation will be the same as the previous result, so the |
| new fetch will happen in the same way as the previous clone or fetch, |
| using, or not using, the same LOP(s) as last time. |
| |
| "Backfill" or "lazy" fetch |
| ++++++++++++++++++++++++++ |
| |
| When there is a backfill fetch, the client doesn't necessarily contact |
| the main remote first. It will try to fetch from its promisor remotes |
| in the order they appear in the config file, except that a remote |
| configured using the `extensions.partialClone` config variable will be |
| tried last. See |
| link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. |
| |
| This is not new with this effort. In fact this is how multiple remotes |
| have already been working for around 5 years. |
| |
| When using LOPs, having the main remote configured using |
| `extensions.partialClone`, so it's tried last, makes sense, as missing |
| objects should only be large blobs that are on LOPs. |
| |
| This means that a protocol negotiation will likely not happen as the |
| missing objects will be fetched from the LOPs, and then there will be |
| nothing left to fetch from the main remote. |
| |
| To secure that, it could be a good idea for LOPs to require a token |
| from the client when it fetches from them. The client could get the |
| token when performing a protocol negotiation with the main remote (see |
| section II.6 above). |
| |
| V) Future improvements |
| ---------------------- |
| |
| It is expected that at the beginning using LOPs will be mostly worth |
| it either in a corporate context where the Git version that clients |
| use can easily be controlled, or on repos that are infrequently |
| accessed. (See the "Could the main remote be bogged down by old or |
| paranoid clients?" section in the FAQ above.) |
| |
| Over time, as more and more clients upgrade to a version that |
| implements the "promisor-remote" protocol v2 capability described |
| above in section II.6), it will be worth it to use LOPs more widely. |
| |
| A lot of improvements may also help using LOPs more widely. Some of |
| these improvements are part of the scope of this document like the |
| following: |
| |
| - Implementing a "remote-object-info" command in the |
| `git cat-file --batch` protocol and its variants to allow main |
| remotes to respond to requests about large blobs without fetching |
| them. (Eric Ju has started working on this based on previous work |
| by Calvin Wan.) |
| |
| - Creating better cleanup and offload mechanisms for main remotes |
| and clients to prevent accumulation of large blobs. |
| |
| - Developing more sophisticated protocol negotiation capabilities |
| between clients and servers for handling LOPs, for example adding |
| a filter-spec (e.g., blob:limit=<size>) or size limit for |
| filtering when cloning, or adding a token for LOP authentication. |
| |
| - Improving security measures for LOP access, particularly around |
| token handling and authentication. |
| |
| - Developing standardized ways to configure and manage multiple LOPs |
| across different environments. Especially in the case where |
| different LOPs serve the same content to clients in different |
| geographical locations, there is a need for replication or |
| synchronization between LOPs. |
| |
| Some improvements, including some that have been mentioned in the "0) |
| Non Goals" section of this document, are out of the scope of this |
| document: |
| |
| - Implementing a new object representation for large blobs on the |
| client side. |
| |
| - Developing pluggable ODBs or other object database backends that |
| could chunk large blobs, dedup the chunks and store them |
| efficiently. |
| |
| - Optimizing data transfer between LOPs and clients/servers, |
| particularly for incompressible and non-deltifying content. |
| |
| - Creating improved client side tools for managing large objects |
| more effectively, for example tools for migrating from Git LFS or |
| git-annex, or tools to find which objects could be offloaded and |
| how much disk space could be reclaimed by offloading them. |
| |
| Some improvements could be seen as part of the scope of this document, |
| but might already have their own separate projects from the Git |
| project, like: |
| |
| - Improving existing remote helpers to access object storage or |
| developing new ones. |
| |
| - Improving existing object storage solutions or developing new |
| ones. |
| |
| Even though all the above improvements may help, this document and the |
| LOP effort should try to focus, at least first, on a relatively small |
| number of improvements mostly those that are in its current scope. |
| |
| For example introducing pluggable ODBs and a new object database |
| backend is likely a multi-year effort on its own that can happen |
| separately in parallel. It has different technical requirements, |
| touches other part of the Git code base and should have its own design |
| document(s). |