Re: [PATCH 6/6] kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest

From: Paul E. McKenney
Date: Tue Feb 10 2015 - 16:17:21 EST


On Tue, Feb 10, 2015 at 01:00:35PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 10, 2015 at 12:42 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, Feb 10, 2015 at 12:19:28PM -0800, Andy Lutomirski wrote:
> >> On Tue, Feb 10, 2015 at 12:14 PM, Paul E. McKenney
> >> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >> > On Tue, Feb 10, 2015 at 11:59:09AM -0800, Andy Lutomirski wrote:
> >> >> On 02/10/2015 06:41 AM, riel@xxxxxxxxxx wrote:
> >> >> >From: Rik van Riel <riel@xxxxxxxxxx>
> >> >> >
> >> >> >The host kernel is not doing anything while the CPU is executing
> >> >> >a KVM guest VCPU, so it can be marked as being in an extended
> >> >> >quiescent state, identical to that used when running user space
> >> >> >code.
> >> >> >
> >> >> >The only exception to that rule is when the host handles an
> >> >> >interrupt, which is already handled by the irq code, which
> >> >> >calls rcu_irq_enter and rcu_irq_exit.
> >> >> >
> >> >> >The guest_enter and guest_exit functions already switch vtime
> >> >> >accounting independent of context tracking. Leave those calls
> >> >> >where they are, instead of moving them into the context tracking
> >> >> >code.
> >> >> >
> >> >> >Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
> >> >> >---
> >> >> > include/linux/context_tracking.h | 6 ++++++
> >> >> > include/linux/context_tracking_state.h | 1 +
> >> >> > include/linux/kvm_host.h | 3 ++-
> >> >> > 3 files changed, 9 insertions(+), 1 deletion(-)
> >> >> >
> >> >> >diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> >> >> >index 954253283709..b65fd1420e53 100644
> >> >> >--- a/include/linux/context_tracking.h
> >> >> >+++ b/include/linux/context_tracking.h
> >> >> >@@ -80,10 +80,16 @@ static inline void guest_enter(void)
> >> >> > vtime_guest_enter(current);
> >> >> > else
> >> >> > current->flags |= PF_VCPU;
> >> >> >+
> >> >> >+ if (context_tracking_is_enabled())
> >> >> >+ context_tracking_enter(IN_GUEST);
> >> >>
> >> >> Why the if statement?
> >> >>
> >> >> Also, have you checked how much this hurts guest lightweight
> >> >> entry/exit latency? Context tracking is shockingly expensive for
> >> >> reasons I don't fully understand, but hopefully most of it is the
> >> >> vtime stuff. (Context tracking is *so* expensive that I almost
> >> >> think we should set the performance taint flag if we enable it,
> >> >> assuming that flag ended up getting merged. Also, we should make
> >> >> context tracking faster.)
> >> >
> >> > It turns out that context_tracking_is_enabled() is a static inline
> >> > that uses a static_key, so the overhead should be minimal on platforms
> >> > having a full implementation of static keys.
> >>
> >> Shouldn't we just fold that into context_tracking_xyz_enter?
> >
> > If I am not getting too confused, Rik did that initially, but it caused
> > some pain for the ARM guys. I don't see a performance downside, at
> > least not for a modern compiler that does a decent job of inlining.
>
> It's more of a tidiness issue to me than a performance issue.

I feel that the current patch does a good job of optimizing global tidiness.

> >> Also, why does the vtime stuff depend on RCU extended quiescent
> >> states? To me, they seem mostly orthogonal other than the fact that
> >> they hook into the same places.
> >
> > I might be missing your point, but...
> >
> > If there are no scheduling-clock interrupts, then the CPU needs to be
> > in an extended quiescent state, otherwise you will get RCU CPU stall
> > warnings and eventually OOM. Similarly, if there are no scheduling-clock
> > interupts, then you need to compute the vtime stuff based on start times
> > and deltas instead of relying on a scheduling-clock interrupt that never
> > comes. So it isn't that the vtime and RCU stuff are directly related,
> > but rather that they both must take evasive action if there are to be
> > no scheduling-clock interrupts for an extended time period.
>
> I'm probably missing something, but isn't vtime also used for accurate
> CPU time stats?

Right.

In my previous email, I only talked about what happens if there is no
scheduling-clock interrupt, and your question is instead about what
can happen if the scheduling-clock interrupt is enabled. The accurate
CPU time stats are optional if you leave the scheduling clock on during
userspace execution, but become mandatory in the nohz_full case where
the scheduling clock is disabled across userspace execution. So the
accurate CPU time stats are mandatory in the same situation where you
have an RCU extended quiescent state.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/