Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api

From: Maxim Levitsky
Date: Wed Mar 02 2022 - 06:50:49 EST


On Tue, 2022-03-01 at 19:56 +0200, Maxim Levitsky wrote:
> On Tue, 2022-03-01 at 17:46 +0000, Sean Christopherson wrote:
> > On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index 80a2020c4db40..8d35f56c64020 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > > > u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > > > u64 icr;
> > > > >
> > > > > - if (vcpu->kvm->arch.x2apic_format) {
> > > > > - if (*id != vcpu->vcpu_id)
> > > > > - return -EINVAL;
> > > > > - } else {
> > > > > - if (set)
> > > > > - *id >>= 24;
> > > > > - else
> > > > > - *id <<= 24;
> > > > > - }
> > > > > + if (!vcpu->kvm->arch.x2apic_format && set)
> > > > > + *id >>= 24;
> > > > > +
> > > > > + if (*id != vcpu->vcpu_id)
> > > > > + return -EINVAL;
> > > >
> > > > This breaks backwards compability, userspace will start failing where it previously
> > > > succeeded. It doesn't even require a malicious userspace, e.g. if userspace creates
> > > > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure. It'll
> > > > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > > > trying to close a loophole that doesn't harm KVM.
> > > >
> > > > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > > > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> > >
> > > This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> > > apic_id == vcpu_id.
> >
> > AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
> > switching to x2APIC mode.
> This patch was supposed to go together with patch that fully forbids changing apic id.
>
> > > When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> > > mode, meaning that apic id can't be more that 255 even in x2apic mode.
> >
> > Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
> > will be (256 << 24) == 0. If userspace does get+set, then KVM will see 0 != 256 and
> > reject the set with return -EINVAL.
>
> First of all, I am saying again - with old API (!x2apic_format) - the APIC ID was limited
> to 8 bits - literaly only bits 24-31 of the value was used.
>
> With old API - literaly, the APIC_ID register was supposed to have x2apic format
> regardless of if vCPU is in x2apic mode or not, and the above code cares to translate
> it to the real apic id register from and to.
>
> Thus there is really no point of letting old API change apic id for x2apic mode,
> it is literaly a loophole that should be plugged.

Since I probably didn't explain myself correctly let me try to explain myself again:


First of all let me explain the x86 spec:

APIC id is number, usually relatively small number - its initial value is taken from
vcpu_id, when vCPU is created.
Initial APIC id is exposed in CPUID, as well.

When APIC is in xapic mode, then mmio apic_base + 0x20, contains apic id in bits 24-31.
plus apic id 0xff is reserved for broadcast, thus max of 255 cpus.

When APIC is in x2apic mode, mmio access is disabled, and instead MSR 0x802 contains
the local apic id, and it is the whole 32 bit number.


We have 2 ioctls, KVM_GET_LAPIC/KVM_SET_LAPIC which set/get local apic state.
They use array of apic registers which is basically snapshot of xapic mmio,
and it is not defined fully if these are x2apic or xapic registers.

Because of this, initially the APIC_ID register in this snapshot would
contain apic id in bits 24-31.

To make it possible to support apic id > 256, new KVM cap was added KVM_CAP_X2APIC_API
It has nothing to do with x2apic strictly speaking and can be used regardless of
x2apic mode.

All it does was to make the value at 0x20 offset in that mmio state dump be full 32 bit
value of the apic id.


When APIC state is loading while APIC is in *x2apic* mode it does enforce that
value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.

I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
especially if we make apic id read-only.

I hope that this explains my point better, sorry if I was not able
to explain it well before.

Best regards,
Maxim Levitsky


>
> Best regards,
> Maxim Levitsky
>
> > And as above, userspace that hasn't opted into the x2apic_format could also legitimately
> > change the x2APIC ID.
> >
> > I 100% agree this is a mess and probably can't work without major shenanigans, but on
> > the other hand this isn't harming anything, so why risk breaking userspace, even if the
> > risk is infinitesimally small?
> >
> > I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
> > think it's worth trying to retroactively plug the various holes in KVM.
> >