Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion
From: Sean Christopherson
Date: Wed May 15 2024 - 18:47:30 EST
On Wed, May 15, 2024, Rick P Edgecombe wrote:
> On Wed, 2024-05-15 at 13:05 -0700, Sean Christopherson wrote:
> > On Wed, May 15, 2024, Rick P Edgecombe wrote:
> > > So rather then try to optimize zapping more someday and hit similar
> > > issues, let userspace decide how it wants it to be done. I'm not sure of
> > > the actual performance tradeoffs here, to be clear.
> >
> > ...unless someone is able to root cause the VFIO regression, we don't have
> > the luxury of letting userspace give KVM a hint as to whether it might be
> > better to do a precise zap versus a nuke-and-pave.
>
> Pedantry... I think it's not a regression if something requires a new flag. It
> is still a bug though.
Heh, pedantry denied. I was speaking in the past tense about the VFIO failure,
which was a regression as I changed KVM behavior without adding a flag.
> The thing I worry about on the bug is whether it might have been due to a guest
> having access to page it shouldn't have. In which case we can't give the user
> the opportunity to create it.
>
> I didn't gather there was any proof of this. Did you have any hunch either way?
I doubt the guest was able to access memory it shouldn't have been able to access.
But that's a moot point, as the bigger problem is that, because we have no idea
what's at fault, KVM can't make any guarantees about the safety of such a flag.
TDX is a special case where we don't have a better option (we do have other options,
they're just horrible). In other words, the choice is essentially to either:
(a) cross our fingers and hope that the problem is limited to shared memory
with QEMU+VFIO, i.e. and doesn't affect TDX private memory.
or
(b) don't merge TDX until the original regression is fully resolved.
FWIW, I would love to root cause and fix the failure, but I don't know how feasible
that is at this point.
> > And more importantly, it would be a _hint_, not the hard requirement that TDX
> > needs.
> >
> > > That said, a per-vm know is easier for TDX purposes.
>
> If we don't want it to be a mandate from userspace, then we need to do some per-
> vm checking in TDX's case anyway. In which case we might as well go with the
> per-vm option for TDX.
>
> You had said up the thread, why not opt all non-normal VMs into the new
> behavior. It will work great for TDX. But why do SEV and others want this
> automatically?
Because I want flexibility in KVM, i.e. I want to take the opportunity to try and
break away from KVM's godawful ABI. It might be a pipe dream, as keying off the
VM type obviously has similar risks to giving userspace a memslot flag. The one
sliver of hope is that the VM types really are quite new (though less so for SEV
and SEV-ES), whereas a memslot flag would be easily applied to existing VMs.