Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

From: Dave Martin
Date: Wed Jul 11 2018 - 06:38:29 EST


On Wed, Jul 11, 2018 at 10:05:50AM +0100, Suzuki K Poulose wrote:
> On 10/07/18 18:03, Dave Martin wrote:
> >On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote:
> >>On 09/07/18 14:37, Dave Martin wrote:
> >>>On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> >>>>On 09/07/18 12:23, Dave Martin wrote:
> >
> >[...]
> >
> >>>>>Wedging arguments into a few bits in the type argument feels awkward,
> >>>>>and may be regretted later if we run out of bits, or something can't be
> >>>>>represented in the chosen encoding.
> >>>>
> >>>>I think that's a pretty convincing argument for a "better" CREATE_VM,
> >>>>one that would have a clearly defined, structured (and potentially
> >>>>extensible) argument.
> >>>>
> >>>>I've quickly hacked the following:
> >>>>
> >>>>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>>index b6270a3b38e9..3e76214034c2 100644
> >>>>--- a/include/uapi/linux/kvm.h
> >>>>+++ b/include/uapi/linux/kvm.h
> >>>>@@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt {
> >>>> __u32 pad;
> >>>> };
> >>>>
> >>>>+struct kvm_create_vm2 {
> >>>>+ __u64 version; /* Or maybe not */
> >>>>+ union {
> >>>>+ struct {
> >>>>+#define KVM_ARM_SVE_CAPABLE (1 << 0)
> >>>>+#define KVM_ARM_SELECT_IPA {1 << 1)
> >>>>+ __u64 capabilities;
> >>>>+ __u16 sve_vlen;
> >>>>+ __u8 ipa_size;
> >>>>+ } arm64;
> >>>>+ __u64 dummy[15];
> >>>>+ };
> >>>>+};
> >>>>+
> >>>> #define KVMIO 0xAE
> >>>>
> >>>> /* machine type bits, to be used as argument to KVM_CREATE_VM */
> >>>>
> >>>>Other architectures could fill in their own bits if they need to.
> >>>>
> >>>>Thoughts?
> >>>
> >>>This kind of thing should work, but it may still get messy when we
> >>>add additional fields.
> >>
> >>
> >>Marc, Dave,
> >>
> >>I like Dave's approach. Some comments below.
> >>
> >>>
> >>>It we want this to work cross-arch, would it make sense to go
> >>>for a more generic approach, say
> >>>
> >>>struct kvm_create_vm_attr_any {
> >>> __u32 type;
> >>>};
> >>>
> >>>#define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1
> >>>struct kvm_create_vm_attr_arch_capabilities {
> >>> __u32 type;
> >>> __u16 size; /* support future expansion of capabilities[] */
> >>> __u16 reserved;
> >>> __u64 capabilities[1];
> >>>};
> >>
> >>We also need to advertise which attributes are supported by the host,
> >>so that the user can tune the available ones. That would make a bit mask
> >>like the above trickier, unless we return the supported values back
> >>in the argument ptr for the "probe" call. And this scheme in general
> >>can be useful for passing back a non-boolean result specific to the
> >>attribute, without having a per-attribute ioctl. (e.g, maximum limit
> >>for IPA).
> >
> >Maybe, but this could quickly become bloated. (My approach already
> >feels a bit bloated...)
> >
> >I'm not sure that arbitrarily complex negotiation will really be
> >needed, but userspace might want to change its mind if setting a
> >particular propertiy fails.
> >
> >An alternative might be to have a bunch of per-VM ioctls to configure
> >different things, like x86 has. There's at least precedent for that.
> >For arm, we currently only have a few. That allows for easy extension,
> >at the cost of adding ioctls.
>
> As you know, one of the major problems with the per-VM ioctls is
> the ordering of different operations and tracking to make sure that
> the userspace follows the expected order. e.g, the first approach for
> IPA series was based on this and it made things complex enough to drop
> it.

I'm aware of that, but if we are adding a new KVM_CREATE_VM, we could
perhaps give it different semantics: i.e., we create a half-created VM
that only accepts configuration ioctls and a "finish creation" ioctl
that finalises everything before you're allowed to create devices,
vcpus etc.

This is the sort of thing I was moving torwards for SVE (but for
vcpus there).

I'm not saying we should drop the existing KVM_CREATE_VM2 ideas,
but that we should take a step back if it starts to accrue complexity.

> >
> >There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per-
> >vm capability flags.
>
> May be we could switch to KVM_VM_CAPS and pass a list of capabilities
> to be enabled at creation time ? The kvm_enable_cap can pass in additional
> arguments for each cap. That way we don't have to rely on a new set of
> attributes and probing becomes straight forward.

That's a possibility. I guess we'd need to understand how exactly x86
uses this.

Cheers
---Dave