Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api

From: Sean Christopherson
Date: Thu Mar 03 2022 - 14:38:32 EST


On Thu, Mar 03, 2022, Maxim Levitsky wrote:
> On Thu, 2022-03-03 at 16:51 +0000, Sean Christopherson wrote:
> > On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> > > When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> > > value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
> > >
> > > I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> > > especially if we make apic id read-only.
> >
> > I don't disagree in principle. But, (a) this loophole as existing for nearly 6
> > years, (b) closing the loophole could break userspace, (c) false positive are
> > possible due to truncation, and (d) KVM gains nothing meaningful by closing the
> > loophole.
> >
> > (d) changes when we add a knob to make xAPIC ID read-only, but we can simply
> > require userspace to enable KVM_CAP_X2APIC_API (or force it). That approach
> > avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
> >
>
> (a) - doesn't matter.

Yes, it absolutely matters. If KVM_CAP_X2APIC_API was added in the 5.17 cycle
then I would not be objecting because there is zero chance of breaking userspace.

> (b) - if userspace wants to have non default apic id with x2apic mode,
> which (*)can't even really be set from the guest - this is ridiculous.
>
> (*) Yes I know that in *theory* user can change apic id in xapic mode
> and then switch to x2apic mode - but I really doubt that KVM
> would even honor this - there are already places which assume
> that this is not the case. In fact it would be nice to audit KVM
> on what happens when userspace does this, there might be a nice
> CVE somewhere....
>
> (c) - without KVM_CAP_X2APIC_API, literally just call to KVM_GET_LAPIC/KVM_SET_LAPIC
> will truncate x2apic id if > 255 regardless of my patch - literally this cap
> was added to avoid this.
> What we should do is to avoid creating cpu with vcpu_id > 256 when this cap is not set…

Yes, but _rejecting_ that behavior is a change in KVM's ABI. That's why (a) matters.

> (d) - doesn't matter - again we are talking about x2apic mode in which apic id is read only.

It does matter that changes that potentially break userspace provide value to KVM.
Not theoretical, make us feel warm and fuzzy value, but actual value to KVM or
userspace. KVM is no worse off for this loophole. For userspace, at best this
will break a flawed but functional setup.

Concretely, the below example of using x2APIC with an ID > 255 works because KVM
calculates its APIC maps using what KVM thinks is the x2APIC ID.

With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
if sketchy, setup. Is there likely to be such a real-world setup that doesn't
barf on the inconsistent x2APIC ID? Probably not, but I don't see any reason to
find out.

==== Test Assertion Failure ====
lib/kvm_util.c:1953: ret == 0
pid=837 tid=837 errno=22 - Invalid argument
1 0x0000000000403c0e: vcpu_ioctl at kvm_util.c:1952
2 0x0000000000401548: main at smm_test.c:151
3 0x00007ff76518ebf6: ?? ??:0
4 0x0000000000401829: _start at ??:?
vcpu ioctl 1140895375 failed, rc: -1 errno: 22 (Invalid argument)


diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index a626d40fdb48..cce44d99c919 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -19,7 +19,7 @@
#include "vmx.h"
#include "svm_util.h"

-#define VCPU_ID 1
+#define VCPU_ID 256

#define PAGE_SIZE 4096

@@ -56,7 +56,7 @@ static inline void sync_with_host(uint64_t phase)
static void self_smi(void)
{
x2apic_write_reg(APIC_ICR,
- APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
+ ((uint64_t)VCPU_ID << 32) | APIC_INT_ASSERT | APIC_DM_SMI);
}

static void l2_guest_code(void)
@@ -72,14 +72,10 @@ static void guest_code(void *arg)
{
#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
struct svm_test_data *svm = arg;
struct vmx_pages *vmx_pages = arg;

sync_with_host(1);
-
- wrmsr(MSR_IA32_APICBASE, apicbase | X2APIC_ENABLE);
-
sync_with_host(2);

self_smi();
@@ -139,12 +135,21 @@ int main(int argc, char *argv[])
struct kvm_run *run;
struct kvm_x86_state *state;
int stage, stage_reported;
+ struct kvm_lapic_state lapic;

/* Create VM */
vm = vm_create_default(VCPU_ID, 0, guest_code);

run = vcpu_state(vm, VCPU_ID);

+ vcpu_set_msr(vm, VCPU_ID, MSR_IA32_APICBASE,
+ vcpu_get_msr(vm, VCPU_ID, MSR_IA32_APICBASE) | X2APIC_ENABLE);
+
+ vcpu_ioctl(vm, VCPU_ID, KVM_GET_LAPIC, &lapic);
+ TEST_ASSERT(*((u32 *)&lapic.regs[0x20]) == 0,
+ "x2APIC ID == %u", *((u32 *)&lapic.regs[0x20]));
+ vcpu_ioctl(vm, VCPU_ID, KVM_SET_LAPIC, &lapic);
+
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA,
SMRAM_MEMSLOT, SMRAM_PAGES, 0);
TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT)