Re: [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu

From: Marco Elver
Date: Mon Dec 07 2020 - 06:06:27 EST


On Sun, 6 Dec 2020 at 22:21, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> There have been several reports about KCSAN complaints vs. the racy access
> to tick_do_timer_cpu. The syzbot moderation queue has three different
> patterns all related to this. There are a few more...
>
> As I know that this is intentional and safe, I did not pay much attention
> to it, but Marco actually made me feel bad a few days ago as he explained
> that these intentional races generate too much noise to get to the
> dangerous ones.

My strategy so far was to inspect random data races and decide which
ones might be more interesting and send those, but I haven't had time
to chase data races the past few months. Thus, getting rid of the
intentional boring ones will definitely scale better -- relying on a
human to do filtering really is suboptimal. :-)

> There was an earlier attempt to just silence KCSAN by slapping READ/WRITE
> once all over the place without even the faintiest attempt of reasoning,
> which is definitely the wrong thing to do.
>
> The bad thing about tick_do_timer_cpu is that its only barely documented
> why it is safe and works at all, which makes it extremly hard for someone
> not really familiar with the code to come up with reasoning.
>
> So Marco made me fast forward that item in my todo list and I have to admit
> that it would have been damned helpful if that Gleixner dude would have
> added proper comments in the first place. Would have spared a lot of brain
> twisting. :)
>
> Staring at all usage sites unearthed a few silly things which are cleaned
> up upfront. The actual annotation uses data_race() with proper comments as
> READ/WRITE_ONCE() does not really buy anything under the assumption that
> the compiler does not play silly buggers and tears the 32bit stores/loads
> into byte wise ones. But even that would cause just potentially shorter
> idle sleeps in the worst case and not a complete malfunction.

Ack -- thanks for marking the accesses!

Thanks,
-- Marco