Re: [PATCH v5 3/3] x86/tdx: Add Quote generation support

From: Sathyanarayanan Kuppuswamy
Date: Mon May 16 2022 - 13:39:59 EST

Next message: Jens Axboe: "Re: [PATCH] blk-iocos: fix inuse clamp when iocg deactivate or free"
Previous message: bh1scw: "[PATCH] blk-cgroup: Remove unnecessary rcu_read_lock/unlock()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Dave,

On 5/10/22 3:42 AM, Kai Huang wrote:

On Tue, 2022-05-10 at 11:54 +1200, Kai Huang wrote:

On Mon, 2022-05-09 at 15:09 +0300, Kirill A. Shutemov wrote:

On Mon, May 09, 2022 at 03:37:22PM +1200, Kai Huang wrote:

On Sat, 2022-05-07 at 03:42 +0300, Kirill A. Shutemov wrote:

On Fri, May 06, 2022 at 12:11:03PM +1200, Kai Huang wrote:

Kirill, what's your opinion?

I said before that I think DMA API is the right tool here.

Speculation about future of DMA in TDX is irrelevant here. If semantics
change we will need to re-evaluate all users. VirtIO uses DMA API and it
is conceptually the same use-case: communicate with the host.

Virtio is designed for device driver to use, so it's fine to use DMA API. And
real DMA can happen to the virtio DMA buffers. Attestation doesn't have such
assumption.

Whether attestation driver uses struct device is implementation detail.
I don't see what is you point.

No real DMA is involved in attestation.

So I don't see why TD guest kernel cannot have a simple protocol to vmap() a
page (or couple of pages) as shared on-demand, like below:

page = alloc_page();

addr = vmap(page, pgprot_decrypted(PAGE_KERNEL));

clflush_cache_range(page_address(page), PAGE_SIZE);

MapGPA(page_to_phys(page) | cc_mkdec(0), PAGE_SIZE);

And we can even avoid above clflush_cache_range() if I understand correctly.

Or I missed something?

For completeness, cover free path too. Are you going to opencode page
accept too?

Call __tdx_module_call(TDX_ACCEPT_PAGE, ...) right after MapGPA() to convert
back to private. I don't think there is any problem?

Private->Shared conversion is destructive. You have to split SEPT, flush
TLB. Backward conversion even more costly.

I think I won't call it destructive.

And I suggested before, we can allocate a default size buffer (i.e. 4 pages),
which is large enough to cover all requests for now, during driver
initialization. This avoids IOCTL time conversion. We should still have code
in the IOCTL to check the request buffer size and when it is larger than the
default, the old should be freed a larger one should be allocated. But for now
this code path will never happen.

Btw above is based on assumption that we don't support concurrent IOCTLs. This
version Sathya somehow changed to support concurrent IOCTLs but this was a
surprise as I thought we somehow agreed we don't need to support this.

Hi Dave,

Sorry I forgot to mention that GHCI 1.5 defines a generic TDVMCALL<Service> for
a TD to communicate with VMM or another TD or some service in the host. This
TDVMCALL can support many sub-commands. For now only sub-commands for TD
migration is defined, but we can have more.

For this, we cannot assume the size of the command buffer, and I don't see why
we don't want to support concurrent TDVMCALLs. So looks from long term, we will
very likely need IOCTL time buffer private-shared conversion.

Let me summarize the discussion so far.

Problem: Allocate shared buffer without breaking the direct map.

Solution 1: Use alloc_pages*()/vmap()/set_memory_*crypted() APIs

Pros/Cons:

1. Uses virtual mapped address for shared/private conversion and
hence does not touch the direct mapping.

2. Current version of set_memory_*crypted() APIs modifies the
aliased mappings, which also includes the direct mapping. So if we
want to use set_memory_*() APIs, we need a new variant that does not
touch the direct mapping. An alternative solution is to open code the
page attribute conversion, cache flushing and MapGpa/Page acceptance
logic in the attestation driver itself. But, IMO, this is not
preferred because it is not favorable to sprinkle the mapping
conversion code in multiple places in the kernel. It is better to use
a single API if possible.

3. This solution can possibly break the SEPT entries on private-shared
conversion. The backward conversion is also costly. IMO, since the
attestation requests are not very frequent, we don't need to be
overly concerned about the cost involved in the conversion.

Solution 2: Use DMA alloc APIs.

Pros/Cons:

1. Simpler to use. It taps into the SWIOTLB buffers and does not
affect the direct map. Since we will be using already converted
memory, allocation/freeing will be cheaper compared to solution 1.

2. There is a concern that it is not a long term solution. Since
with advent of TDX IO, not all DMA allocations need to use
SWIOTLB model. But as per Kirill's comments, this is not a concern
and force_dma_unencrypted() hook can be used to differentiate which
devices need to use TDX IO vs SWIOTLB model.

3. Using DMA APIs requires a valid bus device as argument and hence
requires this driver converted into a platform device driver. But,
since this driver does not do real DMA, making above changes just
to use the DMA API is not recommended.

Since both solutions fix the problem (and there are pros/cons), and both
Kai/Kirill's comments conclusion is, there is no hard preference and
to let you decide on it.

Since you have already made a comment about "irrespective of which
model is chosen, you need the commit log talk about the solution and
how it not touches the direct map", I have posted the v6 version
adapting Solution 1.

Please let me know if you agree with this direction or have comments
about the solution.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Next message: Jens Axboe: "Re: [PATCH] blk-iocos: fix inuse clamp when iocg deactivate or free"
Previous message: bh1scw: "[PATCH] blk-cgroup: Remove unnecessary rcu_read_lock/unlock()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]