Re: [PATCH v2] nvme: Cache DMA descriptors to prevent corruption.

From: Julien Bachmann
Date: Tue Jan 24 2023 - 12:24:29 EST


On 12/02/20 07:31 PM, Tom Lendacky wrote:
> On 11/30/20 12:50 PM, Tom Roeder wrote:
>> On Fri, Nov 20, 2020 at 09:02:43AM +0100, Christoph Hellwig wrote:
>>> On Thu, Nov 19, 2020 at 05:27:37PM -0800, Tom Roeder wrote:
>>>> This patch changes the NVMe PCI implementation to cache host_mem_descs
>>>> in non-DMA memory instead of depending on descriptors stored in DMA
>>>> memory. This change is needed under the malicious-hypervisor threat
>>>> model assumed by the AMD SEV and Intel TDX architectures, which encrypt
>>>> guest memory to make it unreadable. Some versions of these architectures
>>>> also make it cryptographically hard to modify guest memory without
>>>> detection.
>>>
>>> I don't think this is a useful threat model, and I've not seen a
>>> discussion on lkml where we had any discussion on this kind of threat
>>> model either.
>>
>> Thanks for the feedback and apologies for the lack of context.
>>
>> I was under the impression that support for AMD SEV SNP will start showing
>> up in KVM soon, and my understanding of SNP is that it implies this threat
>> model for the guest. See the patchset for SEV-ES, which is the generation
>> before SNP:
>> https://lkml.org/lkml/2020/9/14/1168.> This doesn't get quite to the SNP threat model, but it starts to assume
>> more maliciousness on the part of the hypervisor.
>>
>> You can also see the talk from David Kaplan of AMD from the 2019 Linux
>> Security Summit for info about SNP:
>> https://www.youtube.com/watch?v=yr56SaJ_0QI.
>>
>>
>>>
>>> Before you start sending patches that regress optimizations in various
>>> drivers (and there will be lots with this model) we need to have a
>>> broader discussion first.
>>
>> I've added Tom Lendacky and David Kaplan from AMD on the thread now, since
>> I don't think I have enough context to say where this discussion should
>> take place or the degree to which they think it has or hasn't.
>>
>> Tom, David: can you please comment on this?
>
> Any discussion should certainly take place in the open on the mailing
> lists.
>
> Further information on SEV-SNP can be found on the SEV developer web page
> at https://developer.amd.com/sev.
>
> There is a white paper specific to SNP:
> https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
>
> Also, volume 2 of the AMD APM provides further information on the various
> SEV features (sections 15.34 to 15.36):
> https://www.amd.com/system/files/TechDocs/24593.pdf
>
> It is a good idea to go through the various drivers and promote changes
> to provide protection from a malicious hypervisor, but, as Christoph
> states, it needs to be discussed in order to determine the best approach.

Following up on this thread as Confidential Computing (CC) gained more
popularity over the last 2 years. The host-to-guest threat model for
CC is more researched and discussed (e.g. Hardening Linux guest kernel
for CC at the Linux Plumbers Conference 2022 [1]).

Has a more general discussion on this threat model happened on the
lkml since then? Cloud providers, chip makers and academic researchers
[2] patched multiple drivers for host-to-guest vulnerabilities
following research.

>>> And HMB support, which is for low-end consumer devices that are usually
>>> not directly assigned to VMs aren't a good starting point for this.
>>
>> I'm glad to hear that this case doesn't apply directly to cases we would
>> care about for assignment to guests. I'm not very familiar with this
>> codebase, unfortunately. Do the same kinds of issues apply for the kinds
>> of devices that would be assigned to guests?

I’m also not familiar with this codebase but would it be possible for
a malicious hypervisor to send a crafted vendor_id or device_id to
reach this code upon kernel’s PCI probing?

Would the patch now be acceptable with the development of CC or do you
see updates that should be made?

Let me know what you think and what would be the preferred next steps.

Best regards

[1] https://lpc.events/event/16/contributions/1328/
[2] Examples of research and patches
- https://arxiv.org/pdf/2109.10660.pdf
- https://lore.kernel.org/linux-hyperv/20201117105437.xbyjrs4m7garb2lj@liuwe-devbox-debian-v2/T/#t
- https://github.com/torvalds/linux/commit/5218e919c8d06279884aa0baf76778a6817d5b93