Re: [PATCH 04/26] x86/traps: Add #VE support for TDX guest

From: Borislav Petkov
Date: Thu Dec 30 2021 - 13:02:24 EST


On Thu, Dec 30, 2021 at 06:41:27PM +0300, Kirill A. Shutemov wrote:
> The updated commit message is below. Let me know if something is unclear.
>
> ----------------------------8<-------------------------------------------
>
> Virtualization Exceptions (#VE) are delivered to TDX guests due to
> specific guest actions which may happen in either user space or the
> kernel:
>
> * Specific instructions (WBINVD, for example)
> * Specific MSR accesses
> * Specific CPUID leaf accesses
> * Access to unmapped pages (EPT violation)
>
> In the settings that Linux will run in, virtual exceptions are never

virtualization exceptions

> generated on accesses to normal, TD-private memory that has been
> accepted.
>
> Syscall entry code has a critical window where the kernel stack is not
> yet set up. Any exception in this window leads to hard to debug issues
> and can be exploited for privilege escalation. Exceptions in the NMI
> entry code also cause issues. IRET from the exception handle will

"Returning from the exception handler with IRET will... "

> re-enable NMIs and nested NMI will corrupt the NMI stack.
>
> For these reasons, the kernel avoids #VEs during the syscall gap and
> the NMI entry code. Entry code paths do not access TD-shared memory,
> MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
> that might generate #VE. VMM can remove memory from TD at any point,
> but access to unaccepted (or missing) private memory leads to VM
> termination, not to #VE.
>
> Similarly, to page faults and breakpoints, #VEs are allowed in NMI

"Similarly to" - no comma.

> handlers once the kernel is ready to deal with nested NMIs.
>
> During #VE delivery, all interrupts, including NMIs, are blocked until
> TDGETVEINFO is called. It prevents #VE nesting until the kernel reads
> the VE info.
>
> If a guest kernel action which would normally cause a #VE occurs in
> the interrupt-disabled region before TDGETVEINFO, a #DF (fault
> exception) is delivered to the guest which will result in an oops.

That up to here can go over the #VE handler.

> Add basic infrastructure to handle any #VE which occurs in the kernel
> or userspace. Later patches will add handling for specific #VE
> scenarios.
>
> For now, convert unhandled #VE's (everything, until later in this
> series) so that they appear just like a #GP by calling the
> ve_raise_fault() directly. The ve_raise_fault() function is similar
> to #GP handler and is responsible for sending SIGSEGV to userspace
> and CPU die and notifying debuggers and other die chain users.

Yap, better.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette