Re: [PATCH 4/6] timers/nohz: Add a comment about broken iowait counter update race

From: Frederic Weisbecker
Date: Fri Feb 10 2023 - 11:10:12 EST


On Fri, Feb 10, 2023 at 03:39:43PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 10, 2023 at 03:09:15PM +0100, Frederic Weisbecker wrote:
> > The per-cpu iowait task counter is incremented locally upon sleeping.
> > But since the task can be woken to (and by) another CPU, the counter may
> > then be decremented remotely. This is the source of a race involving
> > readers VS writer of idle/iowait sleeptime.
> >
> > The following scenario shows an example where a /proc/stat reader
> > observes a pending sleep time as IO whereas that pending sleep time
> > later eventually gets accounted as non-IO.
> >
> > CPU 0 CPU 1 CPU 2
> > ----- ----- ------
> > //io_schedule() TASK A
> > current->in_iowait = 1
> > rq(0)->nr_iowait++
> > //switch to idle
> > // READ /proc/stat
> > // See nr_iowait_cpu(0) == 1
> > return ts->iowait_sleeptime +
> > ktime_sub(ktime_get(), ts->idle_entrytime)
> >
> > //try_to_wake_up(TASK A)
> > rq(0)->nr_iowait--
> > //idle exit
> > // See nr_iowait_cpu(0) == 0
> > ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
> >
> > As a result subsequent reads on /proc/stat may expose backward progress.
> >
> > This is unfortunately hardly fixable. Just add a comment about that
> > condition.
>
> It is far worse than that, the whole concept of per-cpu iowait is
> absurd. Also see the comment near nr_iowait().

Alas I know :-(