Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

From: Sean Christopherson
Date: Mon Apr 12 2021 - 13:14:24 EST


On Sun, Apr 11, 2021, Len Brown wrote:
> On Fri, Apr 9, 2021 at 5:44 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >
> > On Fri, Apr 9, 2021 at 1:53 PM Len Brown <lenb@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Mar 31, 2021 at 6:45 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Mar 31, 2021 at 3:28 PM Len Brown <lenb@xxxxxxxxxx> wrote:
> > > > > We've also established that when running in a VMM, every update to
> > > > > XCR0 causes a VMEXIT.
> > > >
> > > > This is true, it sucks, and Intel could fix it going forward.
> > >
> > > What hardware fix do you suggest?
> > > If a guest is permitted to set XCR0 bits without notifying the VMM,
> > > what happens when it sets bits that the VMM doesn't know about?
> >
> > The VM could have a mask of allowed XCR0 bits that don't exist.
> >
> > TDX solved this problem *somehow* -- XSETBV doesn't (visibly?) exit on
> > TDX. Surely plain VMX could fix it too.
>
> There are two cases.
>
> 1. Hardware that exists today and in the foreseeable future.
>
> VM modification of XCR0 results in VMEXIT to VMM.
> The VMM sees bits set by the guest, and so it can accept what
> it supports, or send the VM a fault for non-support.
>
> Here it is not possible for the VMM to change XCR0 without the VMM knowing.
>
> 2. Future Hardware that allows guests to write XCR0 w/o VMEXIT.
>
> Not sure I follow your proposal.
>
> Yes, the VM effectively has a mask of what is supported,
> because it can issue CPUID.
>
> The VMM virtualizes CPUID, and needs to know it must not
> expose to the VM any state features it doesn't support.
> Also, the VMM needs to audit XCR0 before it uses XSAVE,
> else the guest could attack or crash the VMM through
> buffer overrun.

The VMM already needs to context switch XCR0 and XSS, so this is a non-issue.

> Is this what you suggest?

Yar. In TDX, XSETBV exits, but only to the TDX module. I.e. TDX solves the
problem in software by letting the VMM tell the TDX module what features the
guest can set in XCR0/XSS via the XFAM (Extended Features Allowed Mask).

But, that software "fix" can also be pushed into ucode, e.g. add an XFAM VMCS
field, the guest can set any XCR0 bits that are '1' in VMCS.XFAM without exiting.

Note, SGX has similar functionality in the form of XFRM (XSAVE-Feature Request
Mask). The enclave author can specify what features will be enabled in XCR0
when the enclave is running. Not that relevant, other than to reinforce that
this is a solvable problem.

> If yes, what do you suggest in the years between now and when
> that future hardware and VMM exist?

Burn some patch space? :-)