Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

From: Sean Christopherson
Date: Wed May 12 2021 - 13:41:49 EST


On Wed, May 12, 2021, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > + int npages, bool enc)
> > +{
> > + PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > +}
>
> Now the question is whether something like that is needed for TDX, and,
> if so, could it be shared by both.

Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime. Ditto
for SEV-SNP. I wanted to squish everything into a single common hypercall, but
that didn't pan out.

The problem is that both TDX and SNP define their own versions of this so that
any guest kernel that complies with the TDX|SNP specification will run cleanly
on a hypervisor that also complies with the spec. This KVM-specific hook doesn't
meet those requires because non-Linux guest support will be sketchy at best, and
non-KVM hypervisor support will be non-existent.

The best we can do, short of refusing to support TDX or SNP, is to make this
KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
control logic is identical. The mechanics of actually invoking the hypercall
will differ, but if done right, everything else should be reusable without
modification.

I had an in-depth analysis of this, but it was all off-list. Pasted below.

TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
from time zero. SNP could technically do the same (with a revised GHCB spec),
but it'd be butt ugly. And of course trying to go that route for either TDX or
SNP would run into the problem of having to coordinate the ABI for the "legacy"
hypercall across all guests and hosts. So yeah, trying to remove any of the
three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.

That being said, I do think we can reuse the KVM specific hypercall for TDX and
SNP. Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
KVM enlightened from switching to the KVM specific hypercall once it can do so.
More thoughts later on.

> I guess a common structure could be used along the lines of what is in the
> GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
> only ever really do a single page range at a time (through
> set_memory_encrypted() and set_memory_decrypted()). The reason for the
> expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
> page ranges in advance. Will TDX need to do something similar?

Yes, TDX needs the exact same thing. All three (SEV, SNP, and TDX) have more or
less the exact same hook in the guest (Linux, obviously) kernel.

> If so, the only real common piece in KVM is a function to track what pages
> are shared vs private, which would only require a simple interface.

It's not just KVM, it's also the relevant code in the guest kernel(s) and other
hypervisors. And the MMU side of KVM will likely be able to share code, e.g. to
act on the page size hint.

> So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
> range is really all that is needed, but it must be common across
> hypervisors. I think that was one Sean's points originally, we don't want
> one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.

For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
to make it a superset of the SNP and TDX protocols so that it _can_ be used in
lieu of the SNP/TDX specific protocol. I don't know for sure whether or not
that will actually yield better code and/or performance, but it costs us almost
nothing and at least gives us the option of further optimizing the Linux+KVM
combination.

It probably shouldn't be a strict superset, as in practice I don't think SNP
approach of having individual entries when batching multiple pages will yield
the best performance. E.g. the vast majority (maybe all?) of conversions for a
Linux guest will be physically contiguous and will have the same preferred page
size, at which point there will be less overhead if the guest specifies a
massive range as opposed to having to santize and fill a large buffer.

TL;DR: I think the KVM hypercall should be something like this, so that it can
be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
performance enhancements or something.

8. KVM_HC_MAP_GPA_RANGE
-----------------------
:Architecture: x86
:Status: active
:Purpose: Request KVM to map a GPA range with the specified attributes.

a0: the guest physical address of the start page
a1: the number of (4kb) pages (must be contiguous in GPA space)
a2: attributes

where 'attributes' could be something like:

bits 3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
bit 4 - plaintext = 0, encrypted = 1
bits 63:5 - reserved (must be zero)