Re: [PATCH tip/core/rcu 11/26] rcu: Exit RCU extended QS on userpreemption

From: Frederic Weisbecker
Date: Mon Sep 10 2012 - 16:26:31 EST


On Thu, Sep 06, 2012 at 07:13:11PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-09-06 at 19:02 +0200, Peter Zijlstra wrote:
> > On Thu, 2012-08-30 at 14:05 -0700, Paul E. McKenney wrote:
> > > From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > >
> > > When exceptions or irq are about to resume userspace, if
> > > the task needs to be rescheduled, the arch low level code
> > > calls schedule() directly.
> > >
> > > At that time we may be in extended quiescent state from RCU
> > > POV: the exception is not anymore protected inside
> > > rcu_user_exit() - rcu_user_enter() and the irq has called
> > > rcu_irq_exit() already.
> > >
> > > Create a new API schedule_user() that calls schedule() inside
> > > rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
> > > will need to rely on it now to implement user preemption safely.
> >
> > > ---
> > > kernel/sched/core.c | 7 +++++++
> > > 1 files changed, 7 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 0bd599b..e841dfc 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -3463,6 +3463,13 @@ asmlinkage void __sched schedule(void)
> > > }
> > > EXPORT_SYMBOL(schedule);
> > >
> > > +asmlinkage void __sched schedule_user(void)
> > > +{
> > > + rcu_user_exit();
> > > + schedule();
> > > + rcu_user_enter();
> > > +}
> >
> >
> > OK, so colour me unconvinced.. why are we doing this?
> >
> > Typically when we call schedule nr_running != 1 (we need current to be
> > running and a possible target to switch to).
> >
> > So I'd prefer to simply have schedule() disable all this adaptive tick
> > nonsense and leave it at that.
>
> In fact, the only way to get here is through ttwu(), which would have
> done the nr_running increment and should have disabled all this adaptive
> stuff.
>
> So again,.. why?

Ok, indeed if the ttwu happened locally or even remotely through an IPI, the
tick engine would stop the tick and exit that RCU extended quiescent state
before we reach that place.

I just want to be sure I'm covering every case. There are some places
around that call set_need_resched() even if no task is waiting for the CPU
(RCU is such an example). In this case the nohz engine won't exit the RCU
user mode before schedule().

Another example: a CPU does a remote wake up so it does set_need_resched()
and sends the IPI. The target CPU could see the TIF_RESCHED before resuming
userspace and call schedule_user() -> schedule() while the IPI has not
yet arrived. In this case we need that rcu_user_exit() before schedule might
make any use of RCU.

Also when the task returns from schedule(), if it's alone in the runqueue,
the rcu_user_enter() that follows can be helpful to stop the tick and
enter our RCU quiescent state.

Also I'm considering this RCU user extended quiescent state as a standalone
feature for now. Indeed the only user of it is the adaptive tickless thing.
But I'm treating that RCU part independantly for now so that it makes it
easier to merge, piecewise.

When the adaptive tickless stuff arrives, these rcu_user...() APIs will
be replaced by some user_enter() / user_exit() hooks that will all be
driven by the nohz subsystem. When we get there the rcu_user_enter()
rcu_user_exit() won't be unconditional anymore.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/