Re: [PATCH] KVM: X86: Fix scan ioapic use-before-initialization

From: Dmitry Vyukov
Date: Wed Jan 09 2019 - 03:28:40 EST


On Wed, Jan 2, 2019 at 3:08 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Fri, Dec 28, 2018 at 10:09 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Dec 28, 2018 at 1:43 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > >
> > > > Nobody reads the kernel mailing list directly - there's just too much traffic.
> > >
> > > As the result bug reports and patches got lots and this is bad and it
> > > would be useful to stop it from happening and there are known ways for
> > > this.
> >
> > Well, let me be a bit more specific: you will find that people read
> > the very _targeted_ mailing lists, because they not only tend to be
> > more specific to some particular interest, but also aren't the flood
> > of hundreds of emails a day.
> >
> > And don't get me wrong: I'm not saying that lkml is useless. Not at
> > all. It's just that it's really more of an archival model than a
> > "people read it" - so you send your emails to a group of people, and
> > then you cc lkml so that when that group gets expanded people can be
> > pointed at the whole thread. Or, obviously, so that commit messages
> > etc can point to discussion.
> >
> > But that does mean that any lkml cc shouldn't be expected to cause a
> > reaction in itself. It's about other things.
> >
> > > syzbot not doing bisection is not the root cause of this
> >
> > Root case? No. But if you do bisection, it means that you can now
> > target things much better. So then it's not lkml and "random
> > collection of maintainers", but a much more targeted group.
> >
> > And that targeted group also ends up being a lot more receptive to it.
> >
> > Again, look at the raw syzbot email and the email by Wanpeng Li. Yes,
> > the syzbot email did bring in a reasonable set of people just based on
> > the oops (I think it did "get_mainainter" on kvm_ioapic_scan_entry()).
> > But Wangpeng ended up sending it to the *particular* people who were
> > directly responsible.
> >
> > > 2. syzbot reports are not worse then average human reports, frequently better.
> >
> > No, they really aren't.
> >
> > They are better in a *technical* sense, but they are also very much
> > obviously automated, which makes the target people take them much less
> > seriously.
> >
> > When you see lots of syzbot emails, and there are lots of more or less
> > random recipients that may or may not be correct, what's the natural
> > reaction to that?
> >
> > Look up "bystander effect".
> >
> > > 3. Bisection is useful, but not important in most cases.
> >
> > No.
> >
> > Exactly because of the problem syzbot has. It's too scatter-shot.
> > People clearly ignore it, because people feel it's not _their_ issue.
> >
> > The advantage of bisection is that it makes the problem much more
> > specific. Right now, you'll find that many developers ignore syzbot
> > simply because it's not worth their time to chase down whether it's
> > even their problem.
> >
> > See what I'm saying?
> >
> > It's the whole "data vs information" issue. Particularly when cc'ing
> > maintainers, who get hundreds of emails a day, you need to convince
> > them that this email is _relevant_.
>
> I see what you are saying and I agree that bisection results will make
> reports better in some cases. But I mean a more general problem.
>
> Say you reported a bug, and it happened so that you missed that single
> right person in CC because something, whatever, can happen, right?
> With the current process it will be a coin flip if your report will be
> routed to the right person or lost. And it's not that you personally
> care a lot about this particular bug, it just happened that you
> noticed it and wanted to be a good samaritan. So you will not keep
> track of it on a post-note on your monitor and won't ping later. But
> the bug can be bad and either cause security problems later, or reach
> release and break things in the field and then require 1000x more work
> to port the fix to all downstream forks.
>
> Or, we heavily rely on end users for testing. End users are not kernel
> developers and can't be generally expected to do pre-triage and proper
> routing. Losing these valuable reports is bad because only small
> fraction of users report anything to projects and this can also affect
> user trust, if you see that your reports are not acted on, you don't
> report next time.
>
> Even if we take syzbot, it won't be able to bisect all the time for
> multiple reasons:
> - some bugs don't have reproducers (but still very real and sometimes
> manageable to fix)
> - kernel is build/boot broken sometimes for prolonged periods
> - some old bugs are bisected to introduction of the debugging tool
> that detects the bug
> - some crashes can be too flaky for reliable bisection
> - some reproducers won't work on older kernels, yet the bug is there
> - ...
> So it's will be nice to have bisection results when they are
> available, but it does not feel like it should be the only guarantee
> of a bug report not being lost.
>
> Moreover, you can see in the examples I referenced above that they
> were delivered to the right people, but then still lost because there
> is nothing in the kernel development process that would prevent loses.
>
> Moreover, replying on a small set of private emails generally creates
> problems wrt bus-factor and vacations. It would be useful if anybody
> could see what are the open bugs for rdma_cm subsystem at any point in
> time.

This is quite indicative:

Serious issues affecting all filesystems:

Kernel quality control, or the lack thereof
https://lwn.net/Articles/774114/

Comment on ycombinator:
https://news.ycombinator.com/item?id=18844612

I've filed bugs for some of the mentioned copy_file_range() issues
more than two years ago:
- https://bugzilla.kernel.org/show_bug.cgi?id=135461
- https://bugzilla.kernel.org/show_bug.cgi?id=135451
No response...