Re: TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and syscall trainwreck in hardware)

From: Sean Christopherson
Date: Wed Aug 26 2020 - 15:16:56 EST


On Tue, Aug 25, 2020 at 10:28:53AM -0700, Andy Lutomirski wrote:
> On Tue, Aug 25, 2020 at 10:19 AM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> > One thought would be to have the TDX module (thing that runs in SEAM and
> > sits between the VMM and the guest) provide a TDCALL (hypercall from guest
> > to TDX module) to the guest that would allow the guest to specify a very
> > limited number of GPAs that must never generate a #VE, e.g. go straight to
> > guest shutdown if a disallowed GPA would go pending. That seems doable
> > from a TDX perspective without incurring noticeable overhead (assuming the
> > list of GPAs is very small) and should be easy to to support in the guest,
> > e.g. make a TDCALL/hypercall or two during boot to protect the SYSCALL
> > page and its scratch data.
>
> I guess you could do that, but this is getting gross. The x86
> architecture has really gone off the rails here.

Does it suck less than using an IST? Honest question.

I will add my voice to the "fix SYSCALL" train, but the odds of that getting
a proper fix in time to intercept TDX are not good. On the other hand,
"fixing" the SYSCALL issue in the TDX module is much more feasible, but only
if we see real value in such an approach as opposed to just using an IST. I
personally like the idea of a TDX module solution as I think it would be
simpler for the kernel to implement/support, and would mean we wouldn't need
to roll back IST usage for #VE if the heavens should part and bestow upon us
a sane SYSCALL.