Re: [PATCH v2 2/5] x86/kexec: do unconditional WBINVD in relocate_kernel()

From: Tom Lendacky
Date: Fri Mar 22 2024 - 10:51:07 EST


On 3/22/24 05:40, Kirill A. Shutemov wrote:
On Thu, Mar 21, 2024 at 04:02:11PM -0500, Tom Lendacky wrote:
On 3/20/24 18:10, Kirill A. Shutemov wrote:
On Thu, Mar 21, 2024 at 09:48:28AM +1300, Huang, Kai wrote:

Hi Tom,

I am not aware of kexec() support status for SEV-ES/SEV-SNP guests.
Does patch 1 break them?

SNP guests can kexec with some patches that are currently in process
around shared to private memory conversions. ES guests can only kexec
with a single vCPU. There was a recent patch series to add support for
multiple vCPUs.

Patch #1 doesn't break either ES or SNP because we still have an IDT and
traditional kernel addressing in place, so the #VC can be handled.

How about plain SEV guest?


Whereas patch #2 has switched to identity mapping and removed the IDT,
so a #VC causes a triple fault.

That makes sense. Thanks.

Hi Kirill,

Does TDX guest have similar behaviour -- that WBINVD in stop_this_cpu() can
be handled although it causes #VE, while WBINVD in relocate_kernel() will
just triple fault the guest?

No. We never handle WBINVD #VE. Guest cannot handle WBINVD itself and the
only option is to ask host to do this. We cannot guarantee host will do

Is the WBINVD performed or ignored in that case?

We crash the guest if it tries to use WBINVD. There's no legitimate reason
for it.

anything useful with the request. I guess it can be potential attack
vector if host strategically ignores WBINVD to induce bad guest behaviour.

With SNP, memory is coherent so there isn't a need for a WBINVD within a
guest and so issuing it should not be an issue whether the hypervisor
performs the operation or not. I don't know what can happen in the case
where, say, you have a non-coherent TDISP device attached or such, but that
would be very unusual/unlikely.

Looks like SNP is in the same position as TDX.

And it is not good from host PoV either. If it does WBINVD on every guest
request we get guest->host DoS attack possibility.

Yeah, that can happen today, regardless of the type of VM running.


Tom, I am curious, how do you deal with these problems?

If the WBINVD is being intercepted, then it will generate a #VC and we use
the GHCB protocol to communicate that back to the hypervisor to handle.

I would argue that forwarding it to hypervisor is worse than crashing. It
gives false sense of doing something. Hypervisor is outside TCB and
considered hostile.

Since the memory is coherent, it really doesn't matter what the hypervisor does in regards to WBINVD (ignore it or perform it). And the hypervisor can do anything it wants on any exit, regardless of this intercept.

Thanks,
Tom