Re: [PATCH] get_maintainer.pl: append reason for cc to the name by default

From: Eric W. Biederman
Date: Tue Sep 14 2010 - 13:20:19 EST


Florian Mickler <florian@xxxxxxxxxxx> writes:

> On Mon, 13 Sep 2010 00:57:45 -0700
> Joe Perches <joe@xxxxxxxxxxx> wrote:
>
>> On Mon, 2010-09-13 at 00:16 -0700, Eric W. Biederman wrote:
>> > It is trivial for a human to look at a git log and see which changes
>> > were just global cleanups and which changes were actual maintenance.
>> > Apparently get_maintainers doesn't have that ability.
>>
>> Do you have a useful, trivial or non-trivial algorithm
>> to suggest or is that soft commenting? All I'll say is
>> AI can be a surprisingly difficult field.
>
> :) indeed.

Which is why the tool needs to assist a person in doing the work.
Please deliver a tool and not a broken solution.

>> > Have seen some files with something like 5 years of changes without a
>> > single commit by a maintainer and the only changes happening to it are
>> > global cleanup changes.
>>
>> Then likely there's no actual maintainer for that file.
>
> and which means that get_maintainer.pl --git will output either nothing
> (if we somehow get its heuristics to filter correctly) or wrong people.
>
>>
>> > If get_maintainers would look at MAINTAINERS and validate or invalidate
>> > that information by looking at git that would be useful.
>>
>> Some entries in MAINTAINERS are outdated.
>> Validating MAINTAINERS entries is probably best done once.
>>
>> I suggest you try that concept out, see what you get, and
>> make public the results.
>
> It is easy to make get_maintainer.pl output less people.
> What is not easy is to get it to decrease false-positives while
> not decreasing it's detection rate.

What is needed is something other than output that is a list of
email addresses.

email address foo had x% of non-author signed off bys
email address foo had y% of author signed off bys
email address foo had y% of author commits.
email address foo came from the Maintainers file.

Additionally for email addresses that hit less often a list
of patch subject titles, and truncated sha1 patch ids. So
with luck you can tell at a glance the person is of interest
and if not you can look at their commits quickly and see.

That is all pretty trivial, it should be fast and it should with
a little care let the bogus results be filtered out quickly.

> As far as I can see, Andrew is in favor of not caring about
> false-positives in order to not sacrifice the detection rate of the
> tool.

Which means in time every long time developer will be copied on every
patch. That is what we have lkml for. I don't have a problem with the
tool returning false positives. I do have a problem with the tool
taking away the ability and the responsibility of developers to pay
attention to which human beings they are sending their patches to.

I don't want the tool to do the filtering. I want the tool to give
enough information that the person using the tool can get a feel for the
development history of the affected files and suggestions with a couple
of metrics how useful someone is when Cc'd on a commit.

> My approach tried to lower the impact of false positives by allowing
> people to filter between "cc'd as maintainer" and "cc'd as
> commit_signer". The former is pretty much never a false positive (as
> long as MAINTAINERS is up to date) while the latter is more of a
> hit'n'miss kind of method.

And right now get_maintainer.pl is decreasing the relevancy of cc lines
in commits, which if get_maintainers.pl is used enough could be a
vicious circle.

The problem as I see it is you present of a list of email addresses
without enough information for someone using the tool to guess how
accurate the results are.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/