Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map

From: Radim KrÄmÃÅ
Date: Fri Jul 01 2016 - 10:39:07 EST


2016-07-01 16:03+0200, Paolo Bonzini:
> On 01/07/2016 14:44, Radim KrÄmÃÅ wrote:
>> 2016-07-01 10:42+0200, Paolo Bonzini:
>>> On 01/07/2016 00:15, Andrew Honig wrote:
>>>>>> + /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
>>>>>> + size = round_up(max_id + 1, 16);
>>>> Now that you're using the full range of apic_id values, could this
>>>> calculation overflow? Perhaps max_id could be u64?
>>>
>>> Good point, but I wonder if it's a good idea to let userspace allocate
>>> 32 GB of memory. :)
>>
>> Yes, both could happen. I'll change it to u64 to make it future proof.
>
> It's not necessary to change it to u64 if you put a limit, but you can
> add a WARN_ON(size == 0).

Hm, to save 4 bytes and avoid a WARN_ON, I'll change it to u32
max_apic_id instead of u32 size.

> Also if kvm_apic_map_get_logical_dest() expects multiples of 16, it
> should warn whenever the invariant is not respected.

It was to optimize the fast path ... kvm_apic_map_get_logical_dest() can
handle arbitrary values, so I'll do that instead of checking or assuming
an alignment.

>>> Let's put a limit on the maximum supported APIC ID, and report it
>>> through KVM_CHECK_EXTENSION on the new KVM_CAP_X2APIC_API capability.
>>> If 767 is enough for Knights Landing, the allocation below fits in two
>>> pages. If you need to make it higher, please change the allocation to
>>> use kvm_kvzalloc and kvfree.
>>
>> We sort of have a capability for maximum APIC ID, KVM_MAX_VCPU_ID,
>> because VCPU ID is initial APIC ID and x2APIC ID should always be the
>> initial APIC ID.
>
> Should it?

Yes, x2APIC ID cannot be changed in hardware and is initialized to the
intitial APIC ID.
Letting LAPIC_SET change x2APIC ID would allow scenarios where userspace
reuses old VMs instead of building new ones after reconfiguration.
I don't think it's a sensible use case and it it is currently broken,
because we don't exit to userspace when changing APIC mode, so KVM would
just set APIC ID to VCPU ID on any transition and userspace couldn't
amend it.

> According to QEMU if you have e.g. 3 cores per socket one
> socket take 4 APIC IDs. For Knights Landing the "worst" prime factor in
> 288 is 3^2 so you need APIC IDs up to 288 * (4/3)^2 = 512.

The topology can result in sparse APIC ID and APIC ID is initialized
from VCPU ID, so userspace has to pick VCPU ID accordingly.