Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

From: Ashish Kalra
Date: Thu May 13 2021 - 02:57:20 EST


Hello Sean,

On Wed, May 12, 2021 at 03:51:10PM +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Borislav Petkov wrote:
> > On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > > + int npages, bool enc)
> > > +{
> > > + PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > > +}
> >
> > Now the question is whether something like that is needed for TDX, and,
> > if so, could it be shared by both.
>
> Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime. Ditto
> for SEV-SNP. I wanted to squish everything into a single common hypercall, but
> that didn't pan out.
>
> The problem is that both TDX and SNP define their own versions of this so that
> any guest kernel that complies with the TDX|SNP specification will run cleanly
> on a hypervisor that also complies with the spec. This KVM-specific hook doesn't
> meet those requires because non-Linux guest support will be sketchy at best, and
> non-KVM hypervisor support will be non-existent.
>
> The best we can do, short of refusing to support TDX or SNP, is to make this
> KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
> control logic is identical. The mechanics of actually invoking the hypercall
> will differ, but if done right, everything else should be reusable without
> modification.
>
> I had an in-depth analysis of this, but it was all off-list. Pasted below.
>
> TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
> from time zero. SNP could technically do the same (with a revised GHCB spec),
> but it'd be butt ugly. And of course trying to go that route for either TDX or
> SNP would run into the problem of having to coordinate the ABI for the "legacy"
> hypercall across all guests and hosts. So yeah, trying to remove any of the
> three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.
>
> That being said, I do think we can reuse the KVM specific hypercall for TDX and
> SNP. Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
> vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
> KVM enlightened from switching to the KVM specific hypercall once it can do so.
> More thoughts later on.
>
> > I guess a common structure could be used along the lines of what is in the
> > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
> > only ever really do a single page range at a time (through
> > set_memory_encrypted() and set_memory_decrypted()). The reason for the
> > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
> > page ranges in advance. Will TDX need to do something similar?
>
> Yes, TDX needs the exact same thing. All three (SEV, SNP, and TDX) have more or
> less the exact same hook in the guest (Linux, obviously) kernel.
>
> > If so, the only real common piece in KVM is a function to track what pages
> > are shared vs private, which would only require a simple interface.
>
> It's not just KVM, it's also the relevant code in the guest kernel(s) and other
> hypervisors. And the MMU side of KVM will likely be able to share code, e.g. to
> act on the page size hint.
>
> > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
> > range is really all that is needed, but it must be common across
> > hypervisors. I think that was one Sean's points originally, we don't want
> > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.
>
> For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
> to make it a superset of the SNP and TDX protocols so that it _can_ be used in
> lieu of the SNP/TDX specific protocol. I don't know for sure whether or not
> that will actually yield better code and/or performance, but it costs us almost
> nothing and at least gives us the option of further optimizing the Linux+KVM
> combination.
>
> It probably shouldn't be a strict superset, as in practice I don't think SNP
> approach of having individual entries when batching multiple pages will yield
> the best performance. E.g. the vast majority (maybe all?) of conversions for a
> Linux guest will be physically contiguous and will have the same preferred page
> size, at which point there will be less overhead if the guest specifies a
> massive range as opposed to having to santize and fill a large buffer.
>
> TL;DR: I think the KVM hypercall should be something like this, so that it can
> be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
> performance enhancements or something.
>
> 8. KVM_HC_MAP_GPA_RANGE
> -----------------------
> :Architecture: x86
> :Status: active
> :Purpose: Request KVM to map a GPA range with the specified attributes.
>
> a0: the guest physical address of the start page
> a1: the number of (4kb) pages (must be contiguous in GPA space)
> a2: attributes
>
> where 'attributes' could be something like:
>
> bits 3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
> bit 4 - plaintext = 0, encrypted = 1
> bits 63:5 - reserved (must be zero)
>

Ok. Will modify page encryption status hypercall to be compatible with
the above defined interface.

Thanks,
Ashish