Re: [PATCH 14/19] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

From: Maxim Levitsky
Date: Wed Aug 31 2022 - 09:37:44 EST


On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
> for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to
> honor x86 architectural behavior if multiple vCPUs shared a physical APIC
> ID. As called out in the changelog that added the hack, all CPUs whose
> (possibly truncated) APIC ID matches the target are supposed to receive
> the IPI.
>
> KVM intentionally differs from real hardware, because real hardware
> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
> accept the interrupt in xAPIC mode and it can deliver one interrupt to
> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>
> Applying the hack even when x2APIC is not fully enabled means KVM doesn't
> correctly handle scenarios where the guest has aliased xAPIC IDs across
> multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
> interrupts. It's extremely unlikely any real world guest aliase APIC IDs,
> or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
> lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
> vCPU is "normal".
>
> Furthermore, the hack is _not_ guaranteed to work! The hack works if and
> only if the optimized APIC map is successfully allocated. If the map
> allocation fails (unlikely), KVM will fall back to its unoptimized
> behavior, which _does_ honor the architectural behavior.
>
> Pivot on 32-bit x2APIC IDs being enabled as that is required to take
> advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
> break existing setups unless they are way, way off in the weeds.
>
> And an entry in KVM's errata to document the hack. Alternatively, KVM
> could provide an actual x2APIC quirk and document the hack that way, but
> there's unlikely to ever be a use case for disabling the quirk. Go the
> errata route to avoid having to validate a quirk no one cares about.
>
> Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> Documentation/virt/kvm/x86/errata.rst | 11 ++++++
> arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++-----
> 2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
> index 410e0aa63493..49a05f24747b 100644
> --- a/Documentation/virt/kvm/x86/errata.rst
> +++ b/Documentation/virt/kvm/x86/errata.rst
> @@ -37,3 +37,14 @@ Nested virtualization features
> ------------------------------
>
> TBD
> +
> +x2APIC
> +------
> +When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
> +allows sending events to a single vCPU using its x2APIC ID even if the target
> +vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
> +on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
> +have the same physical APIC ID, KVM will deliver events targeting that APIC ID
> +only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
> +not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
> +matching the target APIC ID receive the interrupt).
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index d537b51295d6..c224b5c7cd92 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -260,10 +260,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct kvm_lapic **cluster;
> + u32 x2apic_id, physical_id;
> u16 mask;
> u32 ldr;
> u8 xapic_id;
> - u32 x2apic_id;
>
> if (!kvm_apic_present(vcpu))
> continue;
> @@ -271,16 +271,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> xapic_id = kvm_xapic_id(apic);
> x2apic_id = kvm_x2apic_id(apic);
>
> - /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
> - if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> - x2apic_id <= new->max_apic_id)
> - new->phys_map[x2apic_id] = apic;
> /*
> - * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
> - * prevent them from masking VCPUs with APIC ID <= 0xff.
> + * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
> + * IDs. Allow sending events to vCPUs by their x2APIC ID even
> + * if the target vCPU is in legacy xAPIC mode, and silently
> + * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
> + * bits, causing IDs > 0xff to wrap and collide).
> + *
> + * Honor the architectural (and KVM's non-optimized) behavior
> + * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
> + * is supposed to process messages independently. If multiple
> + * vCPUs have the same effective APIC ID, e.g. due to the
> + * x2APIC wrap or because the guest manually modified its xAPIC
> + * IDs, events targeting that ID are supposed to be recognized
> + * by all vCPUs with said ID.
> */
> - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> - new->phys_map[xapic_id] = apic;
> + if (kvm->arch.x2apic_format) {
> + /* See also kvm_apic_match_physical_addr(). */
> + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> + x2apic_id <= new->max_apic_id)
> + new->phys_map[x2apic_id] = apic;
> +
> + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> + new->phys_map[xapic_id] = apic;
> + } else {
> + /*
> + * Disable the optimized map if the physical APIC ID is
> + * already mapped, i.e. is aliased to multiple vCPUs.
> + * The optimized map requires a strict 1:1 mapping
> + * between IDs and vCPUs.
> + */
> + if (apic_x2apic_mode(apic))
> + physical_id = x2apic_id;
> + else
> + physical_id = xapic_id;
> +
> + if (new->phys_map[physical_id]) {
> + kvfree(new);
> + new = NULL;
> + goto out;
Why not to use the same KVM_APIC_MODE_XAPIC_FLAT | KVM_APIC_MODE_XAPIC_CLUSTER
hack here?

Other than that looks correct but I don't know this area well enough to be 100% sure.

Best regards,
Maxim Levitsky


> + }
> + new->phys_map[physical_id] = apic;
> + }
>
> if (!kvm_apic_sw_enabled(apic))
> continue;