Re: [RFC] Add BPF_SYNCHRONIZE bpf(2) command

From: Paul E. McKenney
Date: Tue Jul 10 2018 - 14:49:47 EST


On Tue, Jul 10, 2018 at 10:29:57AM -0700, Joel Fernandes wrote:
> On Tue, Jul 10, 2018 at 10:12:29AM -0700, Paul E. McKenney wrote:
> [..]
> > > > > The other question I have is about the whole "nohz-full doesn't work" thing.
> > > > > I didn't fully understand why. RCU is already tracking the state of nohz-full
> > > > > CPUs because the rcu dynticks code in (kernel/rcu/tree.c) monitors
> > > > > transitions to and from usermode even if the timer tick is turned off. So why
> > > > > would it not work?
> > > >
> > > > In the nohz_full case, there is no need for sys_membarrier()'s call to
> > > > synchronize_sched() to interact directly with the nohz_full CPU. It
> > > > can instead look at the target CPU's dyntick-idle state, and that state
> > > > would potentially have been set in the dim distant past, thus having
> > > > no effect on the target CPU's current execution.
> > >
> > > In nohz-idle case though, there's nothing to promote the barrier() to
> > > smp_mb() if you were to purely look at the dynticks-idle state on the
> > > nohz-full CPU executing in user mode?
> > >
> > > So then it makes sense to me now that nohz-full needs something to IPI that
> > > CPU inorder to enforce the needed memory barrier and pure synchronize_sched()
> > > wouldn't work. So then makes me think the expedited versions of
> > > synchronize_sched should be able to do the job but I could off on a different
> > > track..
> >
> > The problem is that the expedited versions also check the dyntick-idle
> > state and don't touch idle (or nohz_full usermode) CPUs. This is by
> > design for the battery-powered embedded use case. ;-)
>
> Oh ok! ;)
>
> I guess there's also a MEMBARRIER_CMD_GLOBAL_EXPEDITED which seems to IPI
> CPUs (I'm guessing regardless of dynticks state) and execute smp_mb within
> the IPI so userspace can fallback to using that incase MEMBARRIER_CMD_GLOBAL
> returns -EINVAL.

Yes, and this avoids IPIing idle CPUs via the ->mm checks. But it will
IPI nohz_full CPUs in that same process, as it must for correctness.

Thanx, Paul