Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

From: Guenter Roeck
Date: Thu Jan 18 2018 - 09:10:24 EST


On Thu, Jan 18, 2018 at 5:01 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that. The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report. And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
> - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
> - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
> - generic syzkaller smartness will be improved over time.
> - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,

My opinion does not really mean much, if anything. While my personal
opinion is that it would be beneficial to test -next, my understanding
also was that -next was not supposed to be a playground but a
collection of patches which are ready for upstream. Quite obviously,
as this exchange has shown, this is not or no longer the case.

The result is that your testing of -next has not the desired effect of
improving the Linux kernel and of finding problems _before_ they hit
mainline. Instead, your efforts are seen as noise, and syzcaller's
reputation is negatively affected. With that in mind, I would suggest
to stop testing -next. If you ever have spare CPU capacity, you can
start adding subtrees from -next which are known to never be rebased,
such as net-next, taking subtrees tested by 0day as baseline.

Thanks,
Guenter

> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.