Re: [syzbot] INFO: rcu detected stall in newstat

From: Dmitry Vyukov
Date: Mon Nov 29 2021 - 15:30:09 EST


On Mon, 29 Nov 2021 at 15:59, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 29, 2021 at 03:15:16PM +0100, Dmitry Vyukov wrote:
>
> > Right, I missed the "preempt leak: 00000100 -> 00000101" warning. And
> > before that there is also "WARNING: inconsistent lock state" warning.
> > This reminds me of the issues we had with RCU/LOCKDEP before when an
> > RCU warning disabled LOCKDEP tracking, as the result LOCKDEP missed
> > part of events (e.g. tracked lock, but missed subsequent unlock) and
> > due to races/ordering issues it mis-reported them as nonsensical
> > reports.
>
> You're talking about how debug_locks_off() is a hot-racy-mess? That only
> matters if you're triggering stuff concurrently which *mostly* doesn't
> happen.
>
> I'm also not quite sure how to fix that without globally serializing
> everything, which would be super unhappy.

Yes, I think it was debug_locks_off().
But it's not about triggering 2 different, but real bugs concurrently.
It's about producing assorted unexplainable false positives
concurrently with debug_locks_off().
If false positives appear after the first real report (based on the
first "WARNING:" line), then it's not a problem for syzkaller (it will
just take the first one, confusing kernel developers aside).
However, in the previous case it happened so that the false positives
appeared _before_ the first real report and that confused syzkaller
and it reported these assorted false positives as new bugs.
In this case the second and third (potentially false) reports appeared
after the real one, so it's not a problem for syzkaller
parsing/reporting. We just need to learn to ignore them. However, if
debug_locks_off() already flipped some global atomic, couldn't the
report printing function check that atomic and just stop producing any
new reports?
But keep in mind that false-ness of these "inconsistent lock state"
and "preempt leak" in the log is just my hypothesis at this point.