RE: [PATCH] x86/mm/cpa: Warn if set_memory_XXcrypted() fails

From: Michael Kelley (LINUX)
Date: Fri Oct 27 2023 - 12:37:38 EST


From: Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> Sent: Wednesday, October 25, 2023 6:41 PM
>
> On Thu, 2023-10-26 at 00:35 +0000, Michael Kelley (LINUX) wrote:
> > I think you mean "shared" as indicated by the guest page tables (vs."shared"
> > as the state of the page from the host standpoint).  Some precision on
> > that distinction seems useful here and in follow-on patches to make callers'
> > error handling be correct.   As I understand it, the premise is that
> > if the guest is accessing a page as private, and the host/VMM has messed
> > around with the page private/shared status, the confidentiality of the
> > VM is protected.  The risk of leakage occurs when the guest is accessing
> > a page as shared, so kernel code must guard against putting memory
> > on the free list if the guest page tables are marked shared.
> >
>
> For TDX, the scenario of concern in the VMM error case is if the page
> is mapped as shared in the guest page tables *and* it is either also
> marked as shared in the EPT, or the VMM supports automatically
> converting it on access. In the attacker scenario, I think the problem
> is just that it is marked shared in the guest.

Agreed.

>
> I can clarify that it needs to be mapped shared in the guest for there
> to be a problem, but I don't see how it will help the patches to fix
> the callers. It seems like too many details for the callers to know
> about. For example, I think some architectures don't change the PTEs at
> all. The callers abstract shared and private at a higher level.
>

When a caller gets an error from set_memory_decrypted(), it will
take steps to try to get the memory back into a "good" state so
that it can put the memory back on the free list. If it can't get
the memory back into a good state, then it will leak the memory.
I was thinking about how the caller will make that determination.
Is it based on whether set_memory_encrypted() succeeds? I think
that works, as long as (for x86 at least) set_memory_encrypted()
ensures that the guest PTEs are all marked "private" before it
returns success.

So maybe my comment applies to the caller in the sense of
understanding what steps the caller should take to recover from
an error, and the possible outcomes from the attempted recovery.

>
> > To me, this sentence doesn't fully characterize why panic_on_warn
> > would be used.  You describe one reason, which is a caller that fails to
> > properly handle an error and incorrectly puts memory with a "shared"
> > guest PTE on the free list.  But getting an error back also implies that
> > something unknown has gone wrong with the CoCo mechanism for
> > managing private vs. shared pages.  Security focused users would not
> > take the risk of continuing to operate with that kind of unknown
> > error in the core mechanism of a CoCo VM.
>
> Hmm, yea I could see that some users may want to take a hard line and
> terminate if anything looks strange. The counter point is that the VMM
> is actually returning a legal error here. It may be strange based on
> the details of when HyperV and QEMU/KVM would return this error, but
> not architecturally.
>

Agreed, it may be a legal error. But even with legal errors, the guest
doesn't know whether the VMM has left the page in a private or
shared state. If the guest fixes up its PTEs to access the memory
as private and puts the memory back on the free list, that could
be a time bomb that will blow up later. More paranoid guests
will prefer to take the panic when the error is first reported.

> >
> > > +vmm_fail:
> > > +       WARN_ONCE(1, "CPA VMM failure to convert memory (addr=%p, numpages=%d) to %s.\n",
> > > +                 (void *)addr, numpages, enc ? "private" : "shared");
> >
> > I'm not sure about outputting the "addr" value.  It could be
> > useful, but the %p format specifier hashes the value unless the
> > kernel is booted with "no_hash_pointers".   Should %px be used
> > so the address is output unmodified?
>
> Unfortunately, I don't think we can print the kernel virtual address
> because those are supposed to be hidden for security reasons. Ideally,
> I would prefer to print the PFN, but we won't have it here in the case
> of vmalloc's. I thought it might be useful to still have some address
> printed for debugging purposes.
>

I don't object to either approach. I was really just noting that
we won't see the actual kernel virtual address.

Michael