Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

From: Dmitry Vyukov
Date: Mon Jan 22 2018 - 08:31:33 EST


On Thu, Jan 18, 2018 at 3:05 PM, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
>> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>> >>
>> >> If syzkaller can only test one tree than linux-next should be the one.
>> >
>> > Well, there's been some controversy about that. The problem is that
>> > it's often not clear if this is long-standing bug, or a bug which is
>> > in a particular subsystem tree --- and if so, *which* subsystem tree,
>> > etc. So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> > which is often not accurate --- since the location of the crash
>> > doesn't necessarily point out where the problem originated, and hence
>> > who should look at the syzbot report. And so this has caused
>> > some.... irritation.
>>
>>
>> Re set of tested trees.
>>
>> We now have an interesting spectrum of opinions.
>>
>> Some assorted thoughts on this:
>>
>> 1. First, "upstream is clean" won't happen any time soon. There are
>> several reasons for this:
>> - Currently syzkaller only tests a subset of subsystems that it knows
>> how to test, even the ones that it tests it tests poorly. Over time
>> it's improved to test most subsystems and existing subsystems better.
>> Just few weeks ago I've added some descriptions for crypto subsystem
>> and it uncovered 20+ old bugs.
>> - syzkaller is guided, genetic fuzzer over time it leans how to do
>> more complex things by small steps. It takes time.
>> - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
>> memory), KTSAN (data races).
>> - generic syzkaller smartness will be improved over time.
>> - it will get more CPU resources.
>> Effect of all of these things is multiplicative: we test more code,
>> smarter, with more bug-detection tools, with more resources. So I
>> think we need to plan for a mix of old and new bugs for foreseeable
>> future.
>
> That's fine, but when you test Linus's tree, we "know" you are hitting
> something that really is an issue, and it's not due to linux-next
> oddities.
>
> When I see a linux-next report, and it looks "odd", my default reaction
> is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
> in linux-next is just fine." :)
>
>> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
>> harming attribution. I don't see what will change when/if we test only
>> upstream. Then the same mix of old/new bugs will be detected just on
>> upstream, with all of the same problems for old/new, maintainers,
>> which subsystem, etc. I think the amount of bugs in the kernel is
>> significant part of the problem, but the exact boundary where we
>> decide to start killing them won't affect number of bugs.
>
> I don't worry about that, the traceback should tell you a lot, and even
> when that is wrong (i.e. warnings thrown up by sysfs core calls that are
> obviously not a sysfs issue, but rather a subsystem issue), it's easy to
> see.
>
>> 3. If we test only upstream, we increase chances of new security bugs
>> sinking into releases. We sure could raise perceived security value of
>> the bugs by keeping them private, letting them sink into release,
>> letting them sink into distros, and then reporting a high-profile
>> vulnerability. I think that's wrong. There is something broken with
>> value measuring in security community. Bug that is killed before
>> sinking into any release is the highest impact thing. As Alexei noted,
>> fixing bugs es early as possible also reduces fix costs, backporting
>> burden, etc. This also can eliminate need in bisection in some cases,
>> say if you accepted a large change to some files and a bunch of
>> crashes appears for these files on your tree soon, it's obvious what
>> happens.
>
> I agree, this is an issue, but I think you have a lot of "low hanging
> fruit" in Linus's tree left to find. Testing linux-next is great, but
> the odds of something "new" being added there for your type of testing
> right now is usually pretty low, right?


So I've dropped linux-next and mmots for now (you still can see them
for few days for bugs already in the pipeline) and added bpf-next
instead.

bpf-next instance tests under root, has net.core.bpf_jit_enable=1 and
the following syscalls enabled:

"enable_syscalls": [
"bpf", "mkdir", "mount", "close",
"perf_event_open", "ioctl$PERF*", "getpid", "gettid",
"socketpair", "sendmsg", "recvmsg", "setsockopt$sock_attach_bpf",
"socket$kcm", "ioctl$sock_kcm*"
]

Let's see how this goes.