Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

From: Will Deacon
Date: Wed Mar 03 2021 - 14:37:34 EST


[+Marc]

On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > > > Thanks for grabbing the data!
> > > >
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> >
> > Yep, though it'd be good to get Paolo's input, too.
> >
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> >
> > I don't think this is a good candidate for arbitrary hypercall interception. Or
> > rather, I think hypercall interception should be an orthogonal implementation.
> >
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined. Relying on userspace VMMs to
> > implement a common ABI is an unnecessary risk.
> >
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > require further VMM intervention. But, I just don't see the point, it would
> > save only a few lines of code. It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > then mandatory interception would essentially make it impossible for KVM to do
> > bookkeeping while still honoring the interception request.
> >
> > However, I do think it would make sense to have the userspace exit be a generic
> > exit type. But hey, we already have the necessary ABI defined for that! It's
> > just not used anywhere.
> >
> > /* KVM_EXIT_HYPERCALL */
> > struct {
> > __u64 nr;
> > __u64 args[6];
> > __u64 ret;
> > __u32 longmode;
> > __u32 pad;
> > } hypercall;
> >
> >
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that). Then userspace could also be in control of the cpuid bit.
> >
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > that will result in truly heinous guest code, and extensibility issues.
> >
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker. arm64's pKVM work has a near-identical use case for a guest to share
> > memory with a host. I can't think of a clever way to avoid having to support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
>
> Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> patch-set and probably relevant to arm64 nVHE hypervisor
> mode/implementation, and potentially makes sense as it adds guest
> memory protection as both host and guest kernels are running on the same
> privilege level ?
>
> Though i do see that the pKVM stuff adds two hypercalls, specifically :
>
> pkvm_create_mappings() ( I assume this is for setting shared memory
> regions between host and guest) &
> pkvm_create_private_mappings().
>
> And the use-cases are quite similar to memory protection architectues
> use cases, for example, use with virtio devices, guest DMA I/O, etc.

These hypercalls are both private to the host kernel communicating with
its hypervisor counterpart, so I don't think they're particularly
relevant here. As far as I can see, the more useful thing is to allow
the guest to communicate back to the host (and the VMM) that it has opened
up a memory window, perhaps for virtio rings or some other shared memory.

We hacked this up as a prototype in the past:

https://android-kvm.googlesource.com/linux/+/d12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%5E%21/

but that's all arm64-specific and if we're solving the same problem as
you, then let's avoid arch-specific stuff if possible. The way in which
the guest discovers the interface will be arch-specific (we already have
a mechanism for that and some hypercalls are already allocated by specs
from Arm), but the interface back to the VMM and some (most?) of the host
handling could be shared.

> But, isn't this patch set still RFC, and though i agree that it adds
> an infrastructure for standardised communication between the host and
> it's guests for mutually controlled shared memory regions and
> surely adds some kind of portability between hypervisor
> implementations, but nothing is standardised still, right ?

No, and it seems that you're further ahead than us in terms of
implementation in this area. We're happy to review patches though, to
make sure we end up with something that works for us both.

Will