Re: [PATCH 4/6] timers/nohz: Add a comment about broken iowait counter update race

From: Peter Zijlstra
Date: Fri Feb 10 2023 - 09:40:38 EST


On Fri, Feb 10, 2023 at 03:09:15PM +0100, Frederic Weisbecker wrote:
> The per-cpu iowait task counter is incremented locally upon sleeping.
> But since the task can be woken to (and by) another CPU, the counter may
> then be decremented remotely. This is the source of a race involving
> readers VS writer of idle/iowait sleeptime.
>
> The following scenario shows an example where a /proc/stat reader
> observes a pending sleep time as IO whereas that pending sleep time
> later eventually gets accounted as non-IO.
>
> CPU 0 CPU 1 CPU 2
> ----- ----- ------
> //io_schedule() TASK A
> current->in_iowait = 1
> rq(0)->nr_iowait++
> //switch to idle
> // READ /proc/stat
> // See nr_iowait_cpu(0) == 1
> return ts->iowait_sleeptime +
> ktime_sub(ktime_get(), ts->idle_entrytime)
>
> //try_to_wake_up(TASK A)
> rq(0)->nr_iowait--
> //idle exit
> // See nr_iowait_cpu(0) == 0
> ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
>
> As a result subsequent reads on /proc/stat may expose backward progress.
>
> This is unfortunately hardly fixable. Just add a comment about that
> condition.

It is far worse than that, the whole concept of per-cpu iowait is
absurd. Also see the comment near nr_iowait().