Re: Minimum git commit abbrev length (Was Re: -tip: origin tree buildfailure (was: [GIT PULL] ext4 update) for 2.6.37)

From: Linus Torvalds
Date: Thu Oct 28 2010 - 14:29:14 EST


On Thu, Oct 28, 2010 at 10:27 AM, Ted Ts'o <tytso@xxxxxxx> wrote:
> On Thu, Oct 28, 2010 at 07:17:01PM +0200, Ingo Molnar wrote:
>> > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > Yes. Except for the kernel the default git commit abbreviation is
>> > borderline too short. Seven hex-chars can easily alias with a few
>> > more pulls from me: git will not give aliases at the time it gives
>> > a shorthand, but a month or two later the abbreviated commit may
>> > no longer be unique.
>> >
>> > So I suggest using --abbrev=12 or similar.
>>
>> ok. A helper script i use does this:
>>
>>    git log --pretty=format:"%h: %s" $@
>>
>> I have added --abbrev=12. Might make sense to lengthen the %h
>> default in upstream Git as well?
>
> Maybe the right thing to do is add a git config option which allows
> for a configurable minimum git commit abbreviation length?

Yes. The default of 7 (I think) comes from fairly early in git
development, when seven hex digits was a lot (it covers about 250+
million hash values). Back then I thought that 65k revisions was a lot
(it was what we were about to hit in BK), and each revision tends to
be about 5-10 new objects or so, so a million objects was a big
number.

These days, the kernel isn't even the largest git project, and even
the kernel has about 220k revisions (_much_ bigger than the BK tree
ever was) and we are approaching two million objects. At that point,
seven hex digits is still unique for a lot of them, but when we're
talking about just two orders of magnitude difference between number
of objects and the hash size, there _will_ be hash collisions. It's no
longer even close to unrealistic - it happens all the time.

So I suspect we should both increase the default abbrev that was
unrealistically small, _and_ add a way for people to set their own
default per-project in the git config file.

Maybe something like the attached (not necessarily well-thought-out or
well-tested: I also didn't actually change the default, although I
suspect we should up it from 7 to at least 10).

Linus
builtin/describe.c | 2 +-
cache.h | 5 +++--
config.c | 8 ++++++++
environment.c | 1 +
4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 43caff2..2d98702 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -20,7 +20,7 @@ static int debug; /* Display lots of verbose info */
static int all; /* Any valid ref can be used */
static int tags; /* Allow lightweight tags */
static int longformat;
-static int abbrev = DEFAULT_ABBREV;
+static int abbrev = 7; /* NOTE! Not DEFAULT_ABBREV */
static int max_candidates = 10;
static int found_names;
static const char *pattern;
diff --git a/cache.h b/cache.h
index 33decd9..6c28a81 100644
--- a/cache.h
+++ b/cache.h
@@ -540,6 +540,7 @@ extern int trust_executable_bit;
extern int trust_ctime;
extern int quote_path_fully;
extern int has_symlinks;
+extern int minimum_abbrev, default_abbrev;
extern int ignore_case;
extern int assume_unchanged;
extern int prefer_symlink_refs;
@@ -757,8 +758,8 @@ static inline unsigned int hexval(unsigned char c)
}

/* Convert to/from hex/sha1 representation */
-#define MINIMUM_ABBREV 4
-#define DEFAULT_ABBREV 7
+#define MINIMUM_ABBREV minimum_abbrev
+#define DEFAULT_ABBREV default_abbrev

struct object_context {
unsigned char tree[20];
diff --git a/config.c b/config.c
index 4b0a820..474361c 100644
--- a/config.c
+++ b/config.c
@@ -514,6 +514,14 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}

+ if (!strcmp(var, "core.abbrev")) {
+ int abbrev = git_config_int(var, value);
+ if (abbrev < minimum_abbrev || abbrev > 40)
+ return -1;
+ default_abbrev = abbrev;
+ return 0;
+ }
+
if (!strcmp(var, "core.loosecompression")) {
int level = git_config_int(var, value);
if (level == -1)
diff --git a/environment.c b/environment.c
index de5581f..b98003c 100644
--- a/environment.c
+++ b/environment.c
@@ -15,6 +15,7 @@ int user_ident_explicitly_given;
int trust_executable_bit = 1;
int trust_ctime = 1;
int has_symlinks = 1;
+int minimum_abbrev = 4, default_abbrev = 7;
int ignore_case;
int assume_unchanged;
int prefer_symlink_refs;