Re: [RFC][PATCH 1/7] sched: Fix smp_call_function_single_async() usage for ILB

From: Vincent Guittot
Date: Wed May 27 2020 - 08:07:57 EST


On Wed, 27 May 2020 at 13:28, Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
>
> On Wed, May 27, 2020 at 12:23:23PM +0200, Vincent Guittot wrote:
> > > -static void nohz_csd_func(void *info)
> > > -{
> > > - struct rq *rq = info;
> > > + flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
> >
> > Why can't this be done in nohz_idle_balance() instead ?
> >
> > you are not using flags in nohz_csd_func() and SCHED_SOFTIRQ which
> > calls nohz_idle_balance(), happens after nohz_csd_func(), isn't it ?
> >
> > In this case, you don't have to use the intermediate variable
> > this_rq->nohz_idle_balance
>
> That's in fact to fix the original issue. The softirq was clearing
> the nohz_flags but the softirq could be issued from two sources:
> the tick and the IPI. And the tick source softirq could then clear
> the flags set from the IPI sender before the IPI itself, resulting
> in races such as described there: https://lore.kernel.org/lkml/20200521004035.GA15455@lenoir/

ah yes, even if the cpu is idle, the tick can fire and clear it.

Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

>
> Thanks.