Re: [PATCH 4/4] kvm,rcu: use RCU extended quiescent state when running KVM guest

From: Rik van Riel
Date: Thu Feb 05 2015 - 13:59:28 EST


On 02/05/2015 01:56 PM, Paul E. McKenney wrote:
> On Thu, Feb 05, 2015 at 01:09:19PM -0500, Rik van Riel wrote:
>> On 02/05/2015 12:50 PM, Paul E. McKenney wrote:
>>> On Thu, Feb 05, 2015 at 11:52:37AM -0500, Rik van Riel wrote:
>>>> On 02/05/2015 11:44 AM, Christian Borntraeger wrote:
>>>>> Am 05.02.2015 um 17:35 schrieb riel@xxxxxxxxxx:
>>>>>> From: Rik van Riel <riel@xxxxxxxxxx>
>>>>>>
>>>>>> The host kernel is not doing anything while the CPU is executing
>>>>>> a KVM guest VCPU, so it can be marked as being in an extended
>>>>>> quiescent state, identical to that used when running user space
>>>>>> code.
>>>>>>
>>>>>> The only exception to that rule is when the host handles an
>>>>>> interrupt, which is already handled by the irq code, which
>>>>>> calls rcu_irq_enter and rcu_irq_exit.
>>>>>>
>>>>>> The guest_enter and guest_exit functions already switch vtime
>>>>>> accounting independent of context tracking, so leave those calls
>>>>>> where they are, instead of moving them into the context tracking
>>>>>> code.
>>>>>>
>>>>>> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
>>>>>> ---
>>>>>> include/linux/context_tracking.h | 8 +++++++-
>>>>>> include/linux/context_tracking_state.h | 1 +
>>>>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
>>>>>> index bd9f000fc98d..a5d3bb44b897 100644
>>>>>> --- a/include/linux/context_tracking.h
>>>>>> +++ b/include/linux/context_tracking.h
>>>>>> @@ -43,7 +43,7 @@ static inline enum ctx_state exception_enter(void)
>>>>>> static inline void exception_exit(enum ctx_state prev_ctx)
>>>>>> {
>>>>>> if (context_tracking_is_enabled()) {
>>>>>> - if (prev_ctx == IN_USER)
>>>>>> + if (prev_ctx == IN_USER || prev_ctx == IN_GUEST)
>>>>>> context_tracking_user_enter(prev_ctx);
>>>>>> }
>>>>>> }
>>>>>> @@ -78,6 +78,9 @@ static inline void guest_enter(void)
>>>>>> vtime_guest_enter(current);
>>>>>> else
>>>>>> current->flags |= PF_VCPU;
>>>>>> +
>>>>>> + if (context_tracking_is_enabled())
>>>>>> + context_tracking_user_enter(IN_GUEST);
>>>>>> }
>>>>>
>>>>>
>>>>> Couldnt we make
>>>>> rcu_virt_note_context_switch(smp_processor_id());
>>>>> conditional in include/linux/kvm_host.h (kvm_guest_enter)
>>>>>
>>>>> e.g. something like
>>>>> if (!context_tracking_is_enabled())
>>>>> rcu_virt_note_context_switch(smp_processor_id());
>>>>
>>>> Possibly. I considered the same, but I do not know whether
>>>> or not just rcu_user_enter / rcu_user_exit is enough.
>>>>
>>>> I could certainly try it out and see whether anything
>>>> explodes, but I am not convinced that is careful enough
>>>> when it comes to handling RCU code...
>>>>
>>>> Paul? :)
>>>
>>> That can fail for some odd combinations of Kconfig and boot parameters.
>>> As I understand it, you instead want the following:
>>>
>>> if (!tick_nohz_full_cpu(smp_processor_id()))
>>> rcu_virt_note_context_switch(smp_processor_id());
>>>
>>> Frederic might know of a better approach.
>>
>> Interesting, in what cases would that happen?
>
> My concern, perhaps misplaced, is that context_tracking_is_enabled(),
> but that the current CPU is not a nohz_full= CPU. In that case, the
> context-tracking code would be within its rights to not tell RCU about
> the transition to the guest, and thus RCU would be unaware that the
> CPU was in a quiescent state.
>
> In most workloads, you would expect the CPU to get interrupted or
> preempted or something at some point, which would eventually inform
> RCU, but too much delay would result in the infamous RCU CPU stall
> warning message. So let's code a bit more defensively.
>
>> If context_tracking_is_enabled() we end up eventually
>> calling into rcu_user_enter & rcu_user_exit.
>>
>> If it is not enabled, we call rcu_virt_note_context_switch.
>>
>> In what cases would we need to call both?
>>
>> I don't see exit-to-userspace do the things that
>> rcu_virt_note_context_switch does, and do not understand
>> why userspace is fine with that...
>
> The real danger is doing neither.
>
> On tick_nohz_full_cpu() CPUs, the exit-to-userspace code should invoke
> rcu_user_enter(), which sets some per-CPU state telling RCU to ignore
> that CPU, since it cannot possibly do host RCU read-side critical sections
> while running a guest.

Ahhh, I understand now. Thank you for your
explanation, Paul.

I will make sure your suggestion is in version 2 of
this series.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/