Re: Question about your git habits

From: Charles Bailey
Date: Sat Feb 23 2008 - 09:02:21 EST


On Sat, Feb 23, 2008 at 02:36:59PM +0100, J.C. Pizarro wrote:
> On 2008/2/23, Charles Bailey <charles@xxxxxxxxxxxxx> wrote:
> >
> > It shouldn't matter how aggressively the repositories are packed or what
> > the binary differences are between the pack files are. git clone
> > should (with the --reference option) generate a new pack for you with
> > only the missing objects. If these objects are ~52 MiB then a lot has
> > been committed to the repository, but you're not going to be able to
> > get around a big download any other way.
>
> You're wrong, nothing has to be commited ~52 MiB to the repository.
>
> I'm not saying "commit", i'm saying
>
> "Assume A & B binary git repos and delta_B-A another binary file, i
> request built
> B' = A + delta_B-A where is verified SHA1(B') = SHA1(B) for avoiding
> corrupting".
>
> Assume B is the higher repacked version of "A + minor commits of the day"
> as if B was optimizing 24 hours more the minimum spanning tree. Wow!!!
>

I'm not sure that I understand where you are going with this.
Originally, you stated that if you clone a 775 MiB repository on day
one, and then you clone it again on day two when it was 777 MiB, then
you currently have to download 775 + 777 MiB of data, whereas you
could download a 52 MiB binary diff. I have no idea where that value
of 52 MiB comes from, and I've no idea how many objects were committed
between day one and day two. If we're going to talk about details,
then you need to provide more details about your scenario.

Having said that, here is my original point in some more detail. git
repositories are not binary blobs, they are object databases. Better
than this, they are databases of immutable objects. This means that to
get the difference between one database and another, you only need to
add the objects that are missing from the other database. If the two
databases are actually a database and the same database at short time
interval later, then almost all the objects are going to be common and
the difference will be a small set of objects. Using git:// this set
of objects can be efficiently transfered as a pack file. You may have
a corner case scenario where the following isn't true, but in my
experience an incremental pack file will be a more compact
representation of this difference than a binary difference of two
aggressively repacked git repositories as generated by a generic
binary difference engine.

I'm sorry if I've misunderstood your last point. Perhaps you could
expand in the exact issue that are having if I have, as I'm not sure
that I've really answered your last message.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/