Re: BUG: KCSAN: data-race in tick_nohz_next_event / tick_nohz_stop_tick

From: Marco Elver
Date: Mon Dec 07 2020 - 07:24:26 EST


On Sun, 6 Dec 2020 at 00:47, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Sat, Dec 05 2020 at 19:18, Thomas Gleixner wrote:
> > On Fri, Dec 04 2020 at 20:53, Marco Elver wrote:
> > It might be useful to find the actual variable, data member or whatever
> > which is involved in the various reports and if there is a match then
> > the reports could be aggregated. The 3 patterns here are not even the
> > complete possible picture.
> >
> > So if you sum them up: 58 + 148 + 205 instances then their weight
> > becomes more significant as well.
>
> I just looked into the moderation queue and picked stuff which I'm
> familiar with from the subject line.

We managed to push (almost) everything that was still in private
moderation to public moderation, so now there's even more to look at:
https://syzkaller.appspot.com/upstream?manager=ci2-upstream-kcsan-gce
:-)

> There are quite some reports which have a different trigger scenario,
> but are all related to the same issue.
>
> https://syzkaller.appspot.com/bug?id=f5a5ed5b2b6c3e92bc1a9dadc934c44ee3ba4ec5
> https://syzkaller.appspot.com/bug?id=36fc4ad4cac8b8fc8a40713f38818488faa9e9f4
>
> are just variations of the same problem timer_base->running_timer being
> set to NULL without holding the base lock. Safe, but insanely hard to
> explain why :)
>
> Next:
>
> https://syzkaller.appspot.com/bug?id=e613fc2458de1c8a544738baf46286a99e8e7460
> https://syzkaller.appspot.com/bug?id=55bc81ed3b2f620f64fa6209000f40ace4469bc0
> https://syzkaller.appspot.com/bug?id=972894de81731fc8f62b8220e7cd5153d3e0d383
> .....
>
> That's just the ones which caught my eye and all are related to
> task->flags usage. There are tons more judging from the subject
> lines.
>
> So you really want to look at them as classes of problems and not as
> individual scenarios.

Regarding auto-dedup: as you suggest, it'd make this straightforward
if we had the variable name -- it turns out that's not so trivial. I
think we need compiler support for that, or is there some existing
infrastructure that can just tell us the canonical variable name if it
points into a struct or global? For globals it's fine, but for
arbitrary pointers that point into structs, I don't see how we could
do it without compiler support e.g. mapping PC->variable name (we need
to map instructions back to the variable names they access).

Any precedence for this? [+Cc linux-toolchains@xxxxxxxxxxxxxxx]

Thanks,
-- Marco