Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
From: Maxim Levitsky
Date: Wed Sep 28 2022 - 02:21:32 EST
On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> > On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 2c96c43c313a..6475c882b359 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> > > * AVIC is disabled because SEV doesn't support it.
> > > */
> > > APICV_INHIBIT_REASON_SEV,
> > > +
> > > + /*
> > > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > > + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > > + * independent controls for AVIC vs. x2AVIC, and also because SVM
> > > + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > > + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
> > > + * AVIC can still be activated, but KVM must not create SPTEs for the
> > > + * APIC base. For simplicity, this is sticky.
> > > + */
> > > + APICV_INHIBIT_REASON_X2APIC,
> >
> > Hi Sean!
> >
> > So assuming that I won't object to making it SVM specific (I still think
> > that VMX should also inhibit this memslot because this is closer to x86 spec,
> > but if you really want it this way, I won't fight over it):
>
> Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
> reason to change KVM's behavior and risk silently causing a performance regression.
> If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
> this were the initial APICv implementation and thus no possible users, then I would
> probably also vote to give APICv the same treatment.
>
> > I somewhat don't like this inhibit, because now it is used just to say
> > 'I am AVIC'.
> >
> > What do you think if you just move the code that removes the memslot to SVM,
> > to avic_set_virtual_apic_mode?
>
> Suffers the same SRCU issue (see below) :-/
>
> Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
> for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> another hook on top of spreading the memslot management across x86 and SVM code.
OK, I am not going to argue about this. But what about at least not using an inhibit
bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
or rather something like vcpu->arch.apicv_type enum?
Or we can make SVM code just call a common function - just put these in a function and call it
from avic_set_virtual_apic_mode?
void kvm_disable_apicv_memslot(struct kvm_vcpu *vcpu)
{
if (!vcpu->kvm->arch.apic_access_memslot_inhibited) {
vcpu->kvm->arch.apic_access_memslot_inhibited = true;
kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
}
}
>
> > > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> > > struct kvm_apic_map __rcu *apic_map;
> > > atomic_t apic_map_dirty;
> > >
> > > - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > > - struct rw_semaphore apicv_update_lock;
> > > -
> > > bool apic_access_memslot_enabled;
> > > + bool apic_access_memslot_inhibited;
> >
> > So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> > As I see later in the patch when you free the memslot, you set it to false,
> > which means that if a vCPU is created after that (it can happen in theory),
> > the memslot will be created again :(
> >
> > I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> > enabled to false, and then on next vcpu run, that will free the memslot.
> >
> > when enabled == false, the code needs to be changed to not allocate it again.
>
> This should be handled already. apic_access_memslot_enabled is toggled from
> true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
> flag is protected by slots_lock. Thus, newly created vCPUs are guaranteed to
> either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.
>
> int kvm_alloc_apic_access_page(struct kvm *kvm)
> {
> struct page *page;
> void __user *hva;
> int ret = 0;
>
> mutex_lock(&kvm->slots_lock);
> if (kvm->arch.apic_access_memslot_enabled ||
> kvm->arch.apic_access_memslot_inhibited) <=== prevents reallocation
> goto out;
>
> out:
> mutex_unlock(&kvm->slots_lock);
> return ret;
> }
Ah, you added this in previous patch which I didn't see, makes sense.
>
> That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.
Yep, a WARN_ON_ONCE unless in hot path, is almost always a good idea, so lets add it.
>
> void kvm_free_apic_access_page(struct kvm *kvm)
> {
> WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);
>
> mutex_lock(&kvm->slots_lock);
>
> if (kvm->arch.apic_access_memslot_enabled) {
> __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> kvm->arch.apic_access_memslot_enabled = false;
> }
>
> mutex_unlock(&kvm->slots_lock);
> }
>
> > > +
> > > + /* Protects apicv_inhibit_reasons */
> > > + struct rw_semaphore apicv_update_lock;
> > > unsigned long apicv_inhibit_reasons;
> > >
> > > gpa_t wall_clock;
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 99994d2470a2..70f00eda75b2 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > > }
> > > }
> > >
> > > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> > >
> > > + /*
> > > + * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > > + * the x2APIC inhibit is required. The actual deletion of the
> > > + * memslot is handled by vcpu_run() as SRCU may or may not be
> > > + * held at this time, i.e. updating memslots isn't safe. Don't
> > > + * check apic_access_memslot_inhibited, this vCPU needs to
> > > + * ensure the memslot is deleted before re-entering the guest,
> > > + * i.e. needs to make the request even if the inhibit flag was
> > > + * already set by a different vCPU.
> > > + */
> > > + if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > > + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > > + vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > > + kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> >
> > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
>
> No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
> has a rather weird name, e.g. KVM_REQ_WORK would probably be better.
Roger that!
And I guess lets rename it while we are at it.
>
> > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > and having a special case in kvm_vcpu_update_apicv of
> >
> > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > drop srcu lock
>
> This was my initial thought as well, but the issue is that SRCU may or may not be
> held, and so the unlock+lock would need to be conditional. That's technically a
> solvable problem, as it's possible to detect if SRCU is held, but I really don't
> want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> screw up SRCU.
Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
which drops the srcu lock few lines afterwards, and therefore the
kvm_vcpu_update_apicv is always called with the lock held and it means that it
can drop it for the duration of slot update.
The original issue we had was that we tried to drop the srcu lock in
'kvm_set_apicv_inhibit' which indeed is called from various places,
with, or without the lock held.
Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve that,
but it was not possible because kvm_vcpu_update_apicv is called simultaneously on all vCPUs,
and created various races, including toggling the memslot twice.
So if possible please take another look at using KVM_REQ_APICV_UPDATE instead of KVM_REQ_UNBLOCK.
Best regards,
Maxim Levitsky
>
> > free the memslot
> > take srcu lock
> > }