RE: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

From: Kalra, Ashish
Date: Thu Jul 07 2022 - 16:31:13 EST


[AMD Official Use Only - General]

Hello Peter,

>> >There is a perf cost to this suggestion but it might make accessing
>> >the GHCB safer for KVM. Have you thought about just using
>> >kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a KVM owned buffer, then copying it back before the VMRUN. That way the KVM doesn't need to guard against page_state_changes on the GHCBs, that could be a perf ?>>improvement in a follow up.
>>
>> Along with the performance costs you mentioned, the main concern here
>> will be the GHCB write-back path (copying it back) before VMRUN: this
>> will again hit the issue we have currently with
>> kvm_write_guest() / copy_to_user(), when we use it to sync the scratch
>> buffer back to GHCB. This can fail if guest RAM is mapped using huge-page(s) and RMP is 4K. Please refer to the patch/fix mentioned below, kvm_write_guest() potentially can fail before VMRUN in case of SNP :
>>
>> commit 94ed878c2669532ebae8eb9b4503f19aa33cd7aa
>> Author: Ashish Kalra <ashish.kalra@xxxxxxx>
>> Date: Mon Jun 6 22:28:01 2022 +0000
>>
>> KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb
>>
>> Using kvm_write_guest() to sync the GHCB scratch buffer can fail
>> due to host mapping being 2M, but RMP being 4K. The page fault handling
>> in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
>> to it being called here in a non-preemptible context. Instead use
>> the already kernel mapped ghcb to sync the scratch buffer when the
>> scratch buffer is contained within the GHCB.

>Ah I didn't see that issue thanks for the pointer.

>The patch description says "When SEV-SNP is enabled the mapped GPA needs to be protected against a page state change." since if the guest were to convert the GHCB page to private when the host is using the GHCB the host could get an RMP violation right?

Right.

>That RMP violation would cause the host to crash unless we use some copy_to_user() type protections.

As such copy_to_user() will only swallow the RMP violation and return failure, so the host can retry the write.

> I don't see anything mechanism for this patch to add the page state change protection discussed. Can't another vCPU still convert the GHCB to private?

We do have the protections for GHCB getting mapped to private specifically, there are new post_{map|unmap}_gfn functions added to verify if it is safe to map
GHCB pages. There is a PSC spinlock added which protects again page state change for these mapped pages.
Below is the reference to this patch:
https://lore.kernel.org/lkml/cover.1655761627.git.ashish.kalra@xxxxxxx/T/#mafcaac7296eb9a92c0ea58730dbd3ca47a8e0756

But do note that there is protection only for GHCB pages and there is a need to add generic post_{map,unmap}_gfn() ops that can be used to verify
that it's safe to map a given guest page in the hypervisor. This is a TODO right now and probably this is something which UPM can address more cleanly.

>I was wrong about the importance of this though seanjc@ walked me through how UPM will solve this issue so no worries about this until the series is rebased on to UPM.

Thanks,
Ashish