Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option

From: Paul E. McKenney
Date: Wed Jul 26 2017 - 11:41:21 EST


On Wed, Jul 26, 2017 at 09:41:28AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 25, 2017 at 04:59:36PM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 25, 2017 at 11:55:10PM +0200, Peter Zijlstra wrote:
>
> > > People always do crazy stuff, but what surprised me is that such s patch
> > > got merged in urcu even though its known broken for a number of
> > > architectures.
> >
> > It did not get merged into urcu. It is instead used directly by a
> > number of people for a number of concurrent algorithms.
>
> Yah, Mathieu also already pointed that out. It seems I really cannot
> deal with github well -- that website always terminally confuses me.
>
> > > > But it would not be hard for userspace code to force IPIs by repeatedly
> > > > awakening higher-priority threads that sleep immediately after being
> > > > awakened, right?
> > >
> > > RT tasks are not readily available to !root, and the user might have
> > > been constrained to a subset of available CPUs.
> >
> > So non-idle non-nohz CPUs never get IPIed for wakeups of SCHED_OTHER
> > threads?
>
> Sure, but SCHED_OTHER auto throttles in that if there's anything else to
> run, you get to wait. So you can't generate an IPI storm with it. Also,
> again, we can be limited to a subset of CPUs.

OK, what is its auto-throttle policy? One round of IPIs per jiffy or
some such?

Does this auto-throttling also apply if the user is running a CPU-bound
SCHED_BATCH or SCHED_IDLE task on each CPU, and periodically waking up
one of a large group of SCHED_OTHER tasks, where the SCHED_OTHER tasks
immediately sleep upon being awakened?

> > > My thinking was that if we observe '!= mm' that CPU will have to do a
> > > context switch in order to make it true. That context switch will
> > > provide the ordering we're after so all is well.
> > >
> > > Quite possible there's a hole in, but since I'm running on fumes someone
> > > needs to spell it out for me :-)
> >
> > This would be the https://marc.info/?l=linux-kernel&m=126349766324224&w=2
> > URL below.
> >
> > Which might or might not still be applicable.
>
> I think we actually have those two smp_mb()'s around the rq->curr
> assignment.
>
> we have smp_mb__before_spinlock(), which per the argument here:
>
> https://lkml.kernel.org/r/20170607162013.755917928@xxxxxxxxxxxxx
>
> is actually a full MB, irrespective of that weird smp_wmb() definition
> we have now. And we have switch_mm() on the other side.

OK, and the rq->curr assignment is in common code, correct? Does this
allow the IPI-only-requesting-process approach to live entirely within
common code?

The 2010 email thread ended up with sys_membarrier() acquiring the
runqueue lock for each CPU, because doing otherwise meant adding code
to the scheduler fastpath. Don't we still need to do this?

https://marc.info/?l=linux-kernel&m=126341138408407&w=2
https://marc.info/?l=linux-kernel&m=126349766324224&w=2

> > > > I was intending to base this on the last few versions of a 2010 patch,
> > > > but maybe things have changed:
> > > >
> > > > https://marc.info/?l=linux-kernel&m=126358017229620&w=2
> > > > https://marc.info/?l=linux-kernel&m=126436996014016&w=2
> > > > https://marc.info/?l=linux-kernel&m=126601479802978&w=2
> > > > https://marc.info/?l=linux-kernel&m=126970692903302&w=2
> > > >
> > > > Discussion here:
> > > >
> > > > https://marc.info/?l=linux-kernel&m=126349766324224&w=2
> > > >
> > > > The discussion led to acquiring the runqueue locks, as there was
> > > > otherwise a need to add code to the scheduler fastpaths.
> > >
> > > TL;DR.. that's far too much to trawl through.
> >
> > So we re-derive it from first principles instead? ;-)
>
> Yep, that's what I usually do anyway, who knows what kind of crazy our
> younger selves were up to ;-)

In my experience, it ends up being a type of crazy worth ignoring only
if I don't ignore it. ;-)

Thanx, Paul