Re: [PATCH v2 12/15] kvm: i8254: Check LAPIC EOI pending when injecting irq on SVM AVIC

From: Suthikulpanit, Suravee
Date: Fri Sep 13 2019 - 10:50:13 EST


Alex,

On 8/27/19 4:10 AM, Alexander Graf wrote:
>
> On 26.08.19 22:46, Suthikulpanit, Suravee wrote:
>> Alex,
>>
>> On 8/19/2019 5:42 AM, Alexander Graf wrote:
>>>
>>>
>>> OnÂ15.08.19Â18:25,ÂSuthikulpanit,ÂSuraveeÂwrote:
>>>> ACKÂnotifiersÂdon'tÂworkÂwithÂAMDÂSVMÂw/ÂAVICÂwhenÂtheÂPITÂinterrupt
>>>> isÂdeliveredÂasÂedge-triggeredÂfixedÂinterruptÂsinceÂAMDÂprocessors
>>>> cannotÂexitÂonÂEOIÂforÂtheseÂinterrupts.
>>>>
>>>> AddÂcodeÂtoÂcheckÂLAPICÂpendingÂEOIÂbeforeÂinjectingÂanyÂpendingÂPIT
>>>> interruptÂonÂAMDÂSVMÂwhenÂAVICÂisÂactivated.
>>>>
>>>> Signed-off-by:ÂSuraveeÂSuthikulpanit <suravee.suthikulpanit@xxxxxxx>
>>>> ---
>>>> ÂÂÂarch/x86/kvm/i8254.cÂ|Â31Â+++++++++++++++++++++++++------
>>>> ÂÂÂ1ÂfileÂchanged,Â25Âinsertions(+),Â6Âdeletions(-)
>>>>
>>>> diffÂ--gitÂa/arch/x86/kvm/i8254.cÂb/arch/x86/kvm/i8254.c
>>>> indexÂ4a6dc54..31c4a9bÂ100644
>>>> ---Âa/arch/x86/kvm/i8254.c
>>>> +++Âb/arch/x86/kvm/i8254.c
>>>> @@Â-34,10Â+34,12Â@@
>>>> ÂÂÂ#includeÂ<linux/kvm_host.h>
>>>> ÂÂÂ#includeÂ<linux/slab.h>
>>>> +#includeÂ<asm/virtext.h>
>>>> ÂÂÂ#includeÂ"ioapic.h"
>>>> ÂÂÂ#includeÂ"irq.h"
>>>> ÂÂÂ#includeÂ"i8254.h"
>>>> +#includeÂ"lapic.h"
>>>> ÂÂÂ#includeÂ"x86.h"
>>>> ÂÂÂ#ifndefÂCONFIG_X86_64
>>>> @@Â-236,6Â+238,12Â@@ÂstaticÂvoidÂdestroy_pit_timer(structÂkvm_pitÂ*pit)
>>>> ÂÂÂÂÂÂÂkthread_flush_work(&pit->expired);
>>>> ÂÂÂ}
>>>> +staticÂinlineÂvoidÂkvm_pit_reset_reinject(structÂkvm_pitÂ*pit)
>>>> +{
>>>> +ÂÂÂÂatomic_set(&pit->pit_state.pending,Â0);
>>>> +ÂÂÂÂatomic_set(&pit->pit_state.irq_ack,Â1);
>>>> +}
>>>> +
>>>> ÂÂÂstaticÂvoidÂpit_do_work(structÂkthread_workÂ*work)
>>>> ÂÂÂ{
>>>> ÂÂÂÂÂÂÂstructÂkvm_pitÂ*pitÂ=Âcontainer_of(work,ÂstructÂkvm_pit,Âexpired);
>>>>
>>>> @@Â-244,6Â+252,23Â@@ÂstaticÂvoidÂpit_do_work(structÂkthread_workÂ*work)
>>>> ÂÂÂÂÂÂÂintÂi;
>>>> ÂÂÂÂÂÂÂstructÂkvm_kpit_stateÂ*psÂ=Â&pit->pit_state;
>>>> +ÂÂÂÂ/*
>>>> +ÂÂÂÂÂ*ÂSince,ÂAMDÂSVMÂAVICÂacceleratesÂwriteÂaccessÂtoÂAPICÂEOI
>>>> +ÂÂÂÂÂ*ÂregisterÂforÂedge-triggerÂinterrupts.ÂPITÂwillÂnotÂbeÂable
>>>> +ÂÂÂÂÂ*ÂtoÂreceiveÂtheÂIRQÂACKÂnotifierÂandÂwillÂalwaysÂbeÂzero.
>>>> +ÂÂÂÂÂ*ÂTherefore,ÂweÂcheckÂifÂanyÂLAPICÂEOIÂpendingÂforÂvectorÂ0
>>>> +ÂÂÂÂÂ*ÂandÂresetÂirq_ackÂifÂnoÂpending.
>>>> +ÂÂÂÂÂ*/
>>>> +ÂÂÂÂifÂ(cpu_has_svm(NULL)Â&&Âkvm->arch.apicv_stateÂ==ÂAPICV_ACTIVATED)Â{
>>>>
>>>> +ÂÂÂÂÂÂÂÂintÂeoiÂ=Â0;
>>>> +
>>>> +ÂÂÂÂÂÂÂÂkvm_for_each_vcpu(i,Âvcpu,Âkvm)
>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂifÂ(kvm_apic_pending_eoi(vcpu,Â0))
>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂeoi++;
>>>> +ÂÂÂÂÂÂÂÂifÂ(!eoi)
>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂkvm_pit_reset_reinject(pit);
>>>
>>> InÂwhichÂcaseÂwouldÂeoiÂbeÂ!=Â0ÂwhenÂAPIC-VÂisÂactive?
>>
>> That would be the case when guest has not processed and/or still
>> processing the interrupt.
>> Once the guest writes to APIC EOI register for edge-triggered
>> interrupt for vector 0,
>> and the AVIC hardware accelerated the access by clearing the highest
>> priority ISR bit,
>> then the eoi should be zero.
>
> Thinking about this a bit more, you're basically saying the irq ack
> notifier never triggers because we don't see the EOI register write, but
> we can determine the state asynchronously.

Yes, we should be able to determine this in lazy fashion only when we
need to inject new interrupts.

> The irqfd code also uses the ack notifier for level irq reinjection.
> Will that break as well?

IIUC, in case of irqfd, the notifier is only used for the case of
resampling irqfds, which are special variety of irqfds used to emulate
level triggered interrupts (see include/linux/kvm_irqfd.h). The AVIC
workaround is only needed for handling EOI for edge-trigger interrupts.

> Wouldn't it make more sense to try to either maintain the ack notifier
> API or remove it completely if we can't find a way to make it work with
> APIC-V?

My understanding is that the ack notifier is needed for KVM to support
KVM_REINJECT_CONTROL ioctl (mentioned here
(https://lkml.org/lkml/2019/2/6/133).

> So what if we detect that an IRQ vector we're injecting for has an irq
> notifier? If it does, we set up / start:
>
> Â * an hrtimer that polls for EOI on that vector
> Â * a flag so that every vcpu on exit checks for EOI on that vector
> Â * a direct call from pit_do_work to check on it as well
>
> Each of them would go through a single code path that then calls the
> ack_notifier.
>
> That way we should be able to just maintain the old API and not get into
> unpleasant surprises that only manifest on a tiny faction of systems,
> right?

Let me send out v3 that will consolidate this into a single code path.

Suravee