Re: [PATCH] virt: Prevent AES-GCM IV reuse in SNP guest driver

From: Tom Lendacky
Date: Thu Oct 20 2022 - 10:02:18 EST


On 10/19/22 16:47, Peter Gonda wrote:
On Wed, Oct 19, 2022 at 2:58 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 10/19/22 15:39, Peter Gonda wrote:
On Wed, Oct 19, 2022 at 1:56 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 10/19/22 14:17, Dionna Amalie Glaze wrote:
On Wed, Oct 19, 2022 at 11:44 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 10/19/22 12:40, Peter Gonda wrote:
On Wed, Oct 19, 2022 at 11:03 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
On 10/19/22 10:03, Peter Gonda wrote:
The ASP and an SNP guest use a series of AES-GCM keys called VMPCKs to
communicate securely with each other. The IV to this scheme is a
sequence number that both the ASP and the guest track. Currently this
sequence number in a guest request must exactly match the sequence
number tracked by the ASP. This means that if the guest sees an error
from the host during a request it can only retry that exact request or
disable the VMPCK to prevent an IV reuse. AES-GCM cannot tolerate IV
reuse see:
https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/comments/800-38-series-drafts/gcm/joux_comments.pdf

OK so the guest retires with the same request when it gets an
SNP_GUEST_REQ_INVALID_LEN error. It expands its internal buffer to

It would just use the pre-allocated snp_dev->certs_data buffer with npages
set to the full size of that buffer.

Actually we allocate that buffer with size SEV_FW_BLOB_MAX_SIZE. Maybe
we want to just allocate this buffer which we think is sufficient and
never increase the allocation?

I see the size of
https://developer.amd.com/wp-content/resources/ask_ark_milan.cert is
3200 bytes. Assuming the VCEK cert is the same size (which it should
be since this .cert is 2 certificates). 16K seems to leave enough room
even for some vendor certificates?

I think just using the 16K buffer (4 pages) as it is allocated today is ok. If we get a SNP_GUEST_REQ_INVALID_LEN error that is larger than 4 pages, then we won't ever be able to pull the certs given how the driver is coded today. In that case, disabling the VMPCK is in order.

A separate patch could be submitted later to improve this overall aspect of the certs buffer if needed.

Thanks,
Tom



hold the certificates. When it finally gets a successful request w/
certs. Do we want to return the attestation bits to userspace, but
leave out the certificate data. Or just error out the ioctl
completely?

We need to be able to return the attestation bits that came back with the
extra certs. So just error out of the ioctl with the length error and let
user-space retry with the recommended number of pages.

That sounded simpler to me. Will do.



I can do that in this series.

Thanks!






For the rate-limiting patch series [1], the rate-limiting will have to be
performed within the kernel, while the mutex is held, and then retry the
exact request again. Otherwise, that error will require disabling the
VMPCK. Either that, or the hypervisor must provide the rate limiting.

Thoughts?

[1] https://lore.kernel.org/lkml/20221013160040.2858732-1-dionnaglaze@xxxxxxxxxx/

Yes I think if the host rate limits the guest. The guest kernel should
retry the exact message. Which mutex are you referring too?

Or the host waits and then submits the request and the guest kernel
doesn't have to do anything. The mutex I'm referring to is the
snp_cmd_mutex that is taken in snp_guest_ioctl().

I think that either the host kernel or guest kernel waiting can lead
to unacceptable delays.
I would recommend that we add a zero argument ioctl to /dev/sev-guest
specifically for retrying the last request.

We can know what the last request is due to the sev_cmd_mutex serialization.
The driver will just keep a scratch buffer for this. Any other request
that comes in without resolving the retry will get an -EBUSY error
code.

And the first caller will have received an -EAGAIN in order to
differentiate between the two situations?


Calling the retry ioctl without a pending command will result in -EINVAL.

Let me know what you think.

I think that sounds reasonable, but there are some catches. You will need
to ensure that the caller that is supposed to retry does actually retry
and that a caller that does retry is the same caller that was told to retry.

Whats the issue with the guest driver taking some time?

This sounds complex because there may be many users of the driver. How
do multiple users coordinate when they need to use the retry ioctl?


Thanks,
Tom


Thanks,
Tom