Re: stable? quality assurance?

From: Willy Tarreau
Date: Sat Sep 04 2010 - 13:22:11 EST


Hi Martin,

On Sat, Sep 04, 2010 at 06:42:19PM +0200, Martin Steigerwald wrote:
(...)
> The main idea here is to have a two-staged freeze process and to
> distribute the "I am only taking bug fixes" work to more people than Linus.
>
> For this to work properly, I think at the time of the release of the
> stable kernel subsystem maintainers and Andrew should branch their trees.
> For example when 2.6.36 is released:
>
> - tree
> => 2.6.36-stable-tree
> => tree, where 2.6.37 stuff will be going in
>
> Thus when subsystem maintainers take new stuff during the merge window, it
> will be for the next kernel release already, not for the current one.
> Except bugfix work. Whereas I think the criteria for bug fix work should not
> be that strict than for the stable patches Greg collects.
>
> Thus it needs to be clear: No new stuff for next kernel already two weeks
> prior to release the current stable kernel.

While I respect your beliefs on this matter (they once were mine too), I now
realized I was wrong for several reasons :
- most developers want to create. They (generally) test what they create,
they believe it's flawless because it works for them. No need for more
testing, go on with new features ; if you refuse to merge their new work
for some time, they work on their own tree and push you more work at once
next time.

- developers need real world use cases. That means more testers. Developers
are bad testers because they don't trigger the unexpected use cases. And
how do you get good testers ? by motivating end users to test your code.
Most testers will only test a new kernel to get a new feature. If it works
for them, no need to push the tests further. So that means that the first
RCs are the most tested, and that the later ones are the least tested.
Thus at one point you can't hope to get bug reports anymore. When you see
an -rc7 or -rc8, you think "hey, -rc4 was OK, let's wait for -final and
install it".

- people concerned by stability don't test every release. They test when
they can, precisely because they can't impact production. So they don't
contribute bug reports in time. And as the 2.4 maintainer, I'm well
aware of that, because when I break something, I only know about it 3-4
months later.

For this reason, I think the release rhythm can't much be changed. I think
that trying to evaluate and publish quality per developer or maintainer can
have a better effect because everyone in the commit chain is responsible.
But even doing that is hard because some changes touch everything and it's
not obvious to say that Mr X or Y has done some crap.

In my opinion, reporting bugs is the most effective way of improving
quality. If you report 10 bugs in a week on the same driver, there are
chances that at one point this driver's author will want to take some
time to audit his code and find other bugs before you next point your
finger at him/her. As you see, the goal is not just to report bugs to
get them fixed, but to educate bug authors.

I can tell you that I am an author of quite a number of bugs in another
project (haproxy), and I absolutely hate it when a bug is detected in
production (especially given the product's goal), to the point that the
code is generally reworked 2, 3, 5, 10 times before being committed. Of
course it is still not enough to catch all bugs, but since the product
has got a widely accepted reputation of being rock solid, I think it
works quite well afterall.

Last, developers must not betray their users' trust. When they're not
certain of their code, this must be advertised (this is often the case
but not always). That helps a lot end users select only reliable features
and experience more stability.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/