Re: [PATCH RFC v7 44/64] KVM: SVM: Remove the long-lived GHCB host map

From: Michael Roth
Date: Fri Jan 20 2023 - 15:10:36 EST


On Wed, Jan 18, 2023 at 10:15:38AM -0800, Alper Gun wrote:
> On Wed, Jan 18, 2023 at 7:27 AM Jeremi Piotrowski
> <jpiotrowski@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Dec 14, 2022 at 01:40:36PM -0600, Michael Roth wrote:
> > > From: Brijesh Singh <brijesh.singh@xxxxxxx>
> > >
> > > On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
> > > and unmaps it just before VM-entry. This long-lived GHCB map is used by
> > > the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
> > >
> > > A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> > > SEV-SNP is enabled the mapped GPA needs to be protected against a page
> > > state change.
> > >
> > > To eliminate the long-lived GHCB mapping, update the GHCB sync operations
> > > to explicitly map the GHCB before access and unmap it after access is
> > > complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
> > > fields be done during sev_es_sync_to_ghcb(), so create two new fields in
> > > the vcpu_svm struct to hold these values when required to be set outside
> > > of the GHCB mapping.
> > >
> > > Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx>
> > > Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx>
> > > [mdr: defer per_cpu() assignment and order it with barrier() to fix case
> > > where kvm_vcpu_map() causes reschedule on different CPU]
> > > Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
> > > arch/x86/kvm/svm/svm.c | 18 +++---
> > > arch/x86/kvm/svm/svm.h | 24 +++++++-
> > > 3 files changed, 116 insertions(+), 57 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index d5c6e48055fb..6ac0cb6e3484 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -2921,15 +2921,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> > > kvfree(svm->sev_es.ghcb_sa);
> > > }
> > >
> > > +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > > +{
> > > + struct vmcb_control_area *control = &svm->vmcb->control;
> > > + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> > > +
> > > + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> > > + /* Unable to map GHCB from guest */
> > > + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> > > +{
> > > + kvm_vcpu_unmap(&svm->vcpu, map, true);
> > > +}
> > > +
> > > static void dump_ghcb(struct vcpu_svm *svm)
> > > {
> > > - struct ghcb *ghcb = svm->sev_es.ghcb;
> > > + struct kvm_host_map map;
> > > unsigned int nbits;
> > > + struct ghcb *ghcb;
> > > +
> > > + if (svm_map_ghcb(svm, &map))
> > > + return;
> > > +
> > > + ghcb = map.hva;
> >
> > dump_ghcb() is called from sev_es_validate_vmgexit() with the ghcb already
> > mapped. How about passing 'struct kvm_host_map *' (or struct ghcb *) as a
> > param to avoid double mapping?
>
> This also causes a soft lockup, PSC spin lock is already acquired in
> sev_es_validate_vmgexit. dump_ghcb will try to acquire the same lock
> again. So a guest can send an invalid ghcb page and cause a host soft
> lockup.

We did notice that issue with v6, and had a fix similar to what Jeremi
suggested, but in this patchset the psc_lock spinlock has been replaced
in favor of taking a read_lock() on kvm->mmu_lock.

The logic there is that userspace drives the page state changes via
kvm_vm_ioctl_set_mem_attributes() now, and it does so while holding a
write_lock() on kvm->mmu_lock, so if we want to guard the GHCB page from
page state changes while the kernel is accessing it, it makes sense to
use the same lock rather than relying on some separate SNP-specific lock.

And because it's now a read_lock(), I don't think the deadlock issue is
present anymore in v7, but it probably does still make sense to avoid the
double-mapping as Jeremi suggested, so we'll plan to make that change for
v8.

Thanks,

Mike