Re: Is there an recommended way to refer to bitkeepr commits?

From: Rob Landley
Date: Sat May 13 2017 - 13:17:34 EST


On 05/13/2017 04:35 AM, Thomas Gleixner wrote:
> On Fri, 12 May 2017, Eric W. Biederman wrote:
>> Which leaves me perplexed. The hashes from tglx's current tree:
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>> on kernel.org and the hashes in your full history tree differ.
>> Given that they are in theory the same tree this distrubs me.

The original build script used to make fullhist is at:

http://landley.net/kdocs/fullhist/make-full-linux-history.tgz

And his original description of what he did and why is at:

https://lwn.net/Articles/285366/

He mentioned something about rewriting dates?

I used the "graft" feature of git (thanks to Junio and people
on #git for the tip) to link them together. I also modified
(via a git-filter-branch) the dates of some commits as for
instance all commits from the Dave Jones's repo had the
same date (23 Nov 2007). For this I mainly used the timestamp info
of files on kernel.org. The script and info I used are also
available on my website[2].

(I tried to read his conversion plumbing but it's in ocaml.)

Apparently he only considered the git commits in Linus's tree to be
worth preserving. I'd forgotten that part. (It was 9 years ago. I
remembered the pre-bitkeeper tree got edited but I forgot the other one
did too.)

>> Case in point in the commit connected to:
>> "[PATCH] linux-2.5.66-signal-cleanup.patch"
>> in tglx's tree is: da334d91ff7001d234863fc7692de1ff90bed57a
>
> That's the proper sha1 for my tree. I jsut verified it against the original
> tree which I still have in my archive.
>
>> *scratches my head*
>>
>> Something appears to have changed somewhere.
>
> Correct. That full history git rewrote the commits in my bitkeeper import.

I only checked that the current ones in Linus's tree were the same.
Nobody'd ever pointed me at a file hash in your conversion of bitkeeper
to git, so over the years I forgot that the date editing extended into
bitkeeper for some reason.

> history.git:
>
> commit 7a2deb32924142696b8174cdf9b38cd72a11fc96
> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Mon Feb 4 17:40:40 2002 -0800
>
> Import changeset

February 4, 2002.

> full-history:
>
> commit 26245c315da55330cb25dbfdd80be62db41dedb2
> Author: linus1 <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Thu Jan 4 12:00:00 2001 -0600
>
> Import changeset

January 4, 2001.

According to https://www.kernel.org/pub/linux/kernel/v2.4/ January 4
2001 is when 2.4.0 was released. So yes, it looks like he rewrote these
dates to be correct.

I see what he did. Linus started his bitkeeper tree by importing 2.4.0
and then applying a year's worth of release diffs from 2.4.0 as
individual commits. That year+ worth of work was all dated February 4,
2002 in the repo, so the fullhist script went through and changed the
dates on those commits to match the release tarballs for those kernel
versions, and that changed the hashes in the rest of the history tree.

Upside, there's no longer a year+ hole in the commit dates (which makes
looking up associated mailing list posts a lot easier). Downside: this
changed the history.git commit hashes for the rest of that era. (I'd
missed that.)

> and as a consequence all other commits have different shas as well.

The most embarassing part is that the ocaml plumbing appears to
occasionally leak host context when doing the conversion, specifically
from "git log 26245c315da5" (checking to make sure the fullhist tree's
dates make sense in context) I get:

commit 26245c315da55330cb25dbfdd80be62db41dedb2
Author: linus1 <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Thu Jan 4 12:00:00 2001 -0600

Import changeset

commit 13a80dffb74939e292b6e90e5d79dd26d577489f
Author: linus1 <landley@driftwood.(none)>
Date: Thu Jan 4 12:00:00 2001 -0600

add prerelease patch to get a 2.4.0

commit 4c5b4d50bb08753433f5962bd926198fe2b7105d
Author: linus1 <torvalds@xxxxxxxxxxxxxxxxxxx>
Date: Sun Dec 31 12:00:00 2000 -0600

That landley@driftood should not be there. Sigh.

I guess the question is which is more broken? I linked the build scripts
above if somebody else wants to modify or rerun them, but... lithp. Do
you prefer a year gap in the archive dates, or do you prefer to call the
history.git hashes cannonical?

Rob