Re: [PATCH v5 5/8] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit

From: Sean Christopherson
Date: Tue Jan 18 2022 - 13:17:08 EST


On Sat, Jan 15, 2022, Zeng Guang wrote:
> > What about tweaking my prep patch from before to the below? That would yield:
> >
> > if (apic_x2apic_mode(apic)) {
> > if (WARN_ON_ONCE(offset != APIC_ICR))
> > return 1;
> >
> > kvm_lapic_msr_read(apic, offset, &val);
>
> I think it's problematic to use kvm_lapic_msr_read() in this case. It
> premises the high 32bit value already valid at APIC_ICR2, while in handling
> "nodecode" x2APIC writes we need get continuous 64bit data from offset 300H
> first and prepare emulation of APIC_ICR2 write.

Ah, I read this part of the spec:

All 64 bits of the ICR are written by using WRMSR to access the MSR with index 830H.
If ECX = 830H, WRMSR writes the 64-bit value in EDX:EAX to the ICR, causing the APIC
to send an IPI. If any of bits 13, 17:16, or 31:20 are set in EAX, WRMSR detects a
reserved-bit violation and causes a general-protection exception (#GP).

but not the part down below that explicit says

VICR refers the 64-bit field at offset 300H on the virtual-APIC page. When the
“virtualize x2APIC mode” VM-execution control is 1 (indicating virtualization of
x2APIC mode), this field is used to virtualize the entire ICR.

But that's indicative of an existing KVM problem. KVM's emulation of x2APIC is
broken. The SDM, in section 10.12.9 ICR Operation in x2APIC Mode, clearly states
that the ICR is extended to 64-bits. ICR2 does not exist in x2APIC mode, full stop.
KVM botched things by effectively aliasing ICR[63:32] to ICR2.

We can and should fix that issue before merging IPIv suport, that way we don't
further propagate KVM's incorrect behavior. KVM will need to manipulate the APIC
state in KVM_{G,S}ET_LAPIC so as not to "break" migration, "break" in quotes because
I highly doubt any kernel reads ICR[63:32] for anything but debug purposes. But
we'd need to do that anyways for IPIv, otherwise migration from an IPIv host to
a non-IPIv host would suffer the same migration bug.

I'll post a series this week, in theory we should be able to reduce the patch for
IPIv support to just having to only touching kvm_apic_write_nodecode().