Re: [PATCH 4/4] kvm,rcu: use RCU extended quiescent state when running KVM guest

From: Paul E. McKenney
Date: Thu Feb 05 2015 - 13:56:39 EST


On Thu, Feb 05, 2015 at 01:09:19PM -0500, Rik van Riel wrote:
> On 02/05/2015 12:50 PM, Paul E. McKenney wrote:
> > On Thu, Feb 05, 2015 at 11:52:37AM -0500, Rik van Riel wrote:
> >> On 02/05/2015 11:44 AM, Christian Borntraeger wrote:
> >>> Am 05.02.2015 um 17:35 schrieb riel@xxxxxxxxxx:
> >>>> From: Rik van Riel <riel@xxxxxxxxxx>
> >>>>
> >>>> The host kernel is not doing anything while the CPU is executing
> >>>> a KVM guest VCPU, so it can be marked as being in an extended
> >>>> quiescent state, identical to that used when running user space
> >>>> code.
> >>>>
> >>>> The only exception to that rule is when the host handles an
> >>>> interrupt, which is already handled by the irq code, which
> >>>> calls rcu_irq_enter and rcu_irq_exit.
> >>>>
> >>>> The guest_enter and guest_exit functions already switch vtime
> >>>> accounting independent of context tracking, so leave those calls
> >>>> where they are, instead of moving them into the context tracking
> >>>> code.
> >>>>
> >>>> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
> >>>> ---
> >>>> include/linux/context_tracking.h | 8 +++++++-
> >>>> include/linux/context_tracking_state.h | 1 +
> >>>> 2 files changed, 8 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> >>>> index bd9f000fc98d..a5d3bb44b897 100644
> >>>> --- a/include/linux/context_tracking.h
> >>>> +++ b/include/linux/context_tracking.h
> >>>> @@ -43,7 +43,7 @@ static inline enum ctx_state exception_enter(void)
> >>>> static inline void exception_exit(enum ctx_state prev_ctx)
> >>>> {
> >>>> if (context_tracking_is_enabled()) {
> >>>> - if (prev_ctx == IN_USER)
> >>>> + if (prev_ctx == IN_USER || prev_ctx == IN_GUEST)
> >>>> context_tracking_user_enter(prev_ctx);
> >>>> }
> >>>> }
> >>>> @@ -78,6 +78,9 @@ static inline void guest_enter(void)
> >>>> vtime_guest_enter(current);
> >>>> else
> >>>> current->flags |= PF_VCPU;
> >>>> +
> >>>> + if (context_tracking_is_enabled())
> >>>> + context_tracking_user_enter(IN_GUEST);
> >>>> }
> >>>
> >>>
> >>> Couldnt we make
> >>> rcu_virt_note_context_switch(smp_processor_id());
> >>> conditional in include/linux/kvm_host.h (kvm_guest_enter)
> >>>
> >>> e.g. something like
> >>> if (!context_tracking_is_enabled())
> >>> rcu_virt_note_context_switch(smp_processor_id());
> >>
> >> Possibly. I considered the same, but I do not know whether
> >> or not just rcu_user_enter / rcu_user_exit is enough.
> >>
> >> I could certainly try it out and see whether anything
> >> explodes, but I am not convinced that is careful enough
> >> when it comes to handling RCU code...
> >>
> >> Paul? :)
> >
> > That can fail for some odd combinations of Kconfig and boot parameters.
> > As I understand it, you instead want the following:
> >
> > if (!tick_nohz_full_cpu(smp_processor_id()))
> > rcu_virt_note_context_switch(smp_processor_id());
> >
> > Frederic might know of a better approach.
>
> Interesting, in what cases would that happen?

My concern, perhaps misplaced, is that context_tracking_is_enabled(),
but that the current CPU is not a nohz_full= CPU. In that case, the
context-tracking code would be within its rights to not tell RCU about
the transition to the guest, and thus RCU would be unaware that the
CPU was in a quiescent state.

In most workloads, you would expect the CPU to get interrupted or
preempted or something at some point, which would eventually inform
RCU, but too much delay would result in the infamous RCU CPU stall
warning message. So let's code a bit more defensively.

> If context_tracking_is_enabled() we end up eventually
> calling into rcu_user_enter & rcu_user_exit.
>
> If it is not enabled, we call rcu_virt_note_context_switch.
>
> In what cases would we need to call both?
>
> I don't see exit-to-userspace do the things that
> rcu_virt_note_context_switch does, and do not understand
> why userspace is fine with that...

The real danger is doing neither.

On tick_nohz_full_cpu() CPUs, the exit-to-userspace code should invoke
rcu_user_enter(), which sets some per-CPU state telling RCU to ignore
that CPU, since it cannot possibly do host RCU read-side critical sections
while running a guest.

In contrast, a non-tick_nohz_full_cpu() CPU doesn't let RCU
know that it is executing in a guest or in userspace. So the
rcu_virt_note_context_switch() does the notification in that case.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/