Re: whomto.pl -- finding out whom to send patches to

From: Vegard Nossum
Date: Sat May 31 2008 - 06:34:19 EST


Hi,

On 5/30/08, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Vegard Nossum <vegard.nossum@xxxxxxxxx> writes:
>
> > I've written this perl script that takes a patch as input and prints the
> > authors/committers of the affected lines, using git-blame as the back end.
> >
> > (The purpose of this is of course to find out whom to send patches to.)
> >
> > There are some caveats:
> >
> > - If I've understood correctly, git-blame incremental output doesn't split
> > commits when a newer one is found, so we currently possibly take into
> > account more than just the last patch to touch a line. This might not be
> > a disadvantage, however...
>
>
> "git blame" does not give irrelevant commits in its output, with or
> without --incremental. Perhaps you were thinking about the "oops, earlier
> one was wrong, here are the corrections" behaviour of "git log
> --early-output", which is an unrelated mechanism in a different command.
>

This comment was based on my observation that several (sometimes
different) commits would span the same line numbers. Though it seems
to also happen that no line is spanned by any commit at all. I
probably misunderstood the output format of the incremental mode.

> But I have to wonder why you used --incremental and not --porcelain
> format, the latter of which is more compact and is designed for parsing by
> tools.
>

There were some different reasons. I found --incremental easier to
parse, and I didn't really want the actual lines of the file. Maybe I
should rewrite to use porcelain instead :-)

> I also have to wonder why you did not use -M, -C, and/or -w, if you used
> blame to find the true origin of lines that are involved.
>

I haven't used these options before and didn't know if it would really
make sense to use them in this context. I guess I could allow them to
pass through from the command line to git-blame...

> Unless the patch is truly about a narrow region of a handful files
> (e.g. micro-optimizing the implementation of a single function without
> changing its external interface at all, or fixing an off-by-one error in a
> group of functions that do similar things), I suspect that it would make
> more sense to use "git shortlog --no-merges -- paths" to get the list of
> people who are involved in the general area, even though they may not have
> been involved in particular _lines_ that the patch touches. For example,
> if a patch changes the lines in a function's implementation, you would
> want input not only from the people who improved the implementation of the
> function over the years, but more from the people who care about the
> calling sites of that function the patch is touching.

Yes, it seems that log/shortlog is the most common (and probably
faster) way of doing what I'm trying to do, based on all the feedback
I had :-) However, I use git-blame myself and so I wanted to automate
that task in particular.

I did not intend for the tool to be fully automatic, however; it
outputs a ranked list of names and e-mails. The user (well, me ;-)) is
still expected to pick the sensible entries and leave out the rest.
For instance, I bet half the patches run through the script on the
linux kernel sources would turn Linus up, even though you don't want
to send patches directly there in most of the cases. And this is
simply because he did the initial commit and a lot of code may not
have changed since that...

Thanks for the comments.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/