Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

From: Liu, Jing2
Date: Mon Feb 22 2021 - 03:38:37 EST



On 2/9/2021 2:12 AM, Paolo Bonzini wrote:
On 08/02/21 19:04, Sean Christopherson wrote:
That said, the case where we saw MSR autoload as faster involved EFER, and
we decided that it was due to TLB flushes (commit f6577a5fa15d, "x86, kvm,
vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you know if
RDMSR/WRMSR is always slower than MSR autoload?
RDMSR/WRMSR may be marginally slower, but only because the autoload stuff avoids
serializing the pipeline after every MSR.

That's probably adding up quickly...

The autoload paths are effectively
just wrappers around the WRMSR ucode, plus some extra VM-Enter specific checks,
as ucode needs to perform all the normal fault checks on the index and value.
On the flip side, if the load lists are dynamically constructed, I suspect the
code overhead of walking the lists negates any advantages of the load lists.

... but yeah this is not very encouraging.
Thanks for reviewing the patches.

Context switch time is a problem for XFD.  In a VM that uses AMX, most threads in the guest will have nonzero XFD but the vCPU thread itself will have zero XFD.  So as soon as one thread in the VM forces the vCPU thread to clear XFD, you pay a price on all vmexits and vmentries.


Spec says,
"If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it saves bit i of XSTATE_BV field of the XSAVE header as 0
(indicating that the state component is in its initialized state).
With the exception of XSAVE, no data is saved for the state
component (XSAVE saves the initial value of the state component..."

Thus, the key point is not losing the non initial AMX state on vmexit
and vmenter. If AMX state is in initialized state, it doesn't matter.
Otherwise, XFD[i] should not be armed with a nonzero value.

If we don't want to extremely set XFD=0 every time on vmexit, it would
be useful to first detect if guest AMX state is initial or not.
How about using XINUSE notation here?
(Details in SDM vol1 13.6 PROCESSOR TRACKING OF
XSAVE-MANAGED STATE, and vol2 XRSTOR/XRSTORS instruction operation part)
The main idea is processor tracks the status of various state components
by XINUSE, and it shows if the state component is in use or not.
When XINUSE[i]=0, state component i is in initial configuration.
Otherwise, kvm should take care of XFD on vmexit.


However, running the host with _more_ bits set than necessary in XFD should not be a problem as long as the host doesn't use the AMX instructions.
Does "running the host" mean running in kvm? why need more bits (host_XFD|guest_XFD),
I'm trying to think about the case that guest_XFD is not enough? e.g.
In guest, it only need bit i when guest supports it and guest uses
the passthru XFD[i] for detecting dynamic usage;
In kvm, kvm doesn't use AMX instructions; and "system software should not
use XFD to implement a 'lazy restore' approach to management of the XTILEDATA
state component."
Out of kvm, kernel ensures setting correct XFD for threads when scheduling;

Thanks,
Jing

So perhaps Jing can look into keeping XFD=0 for as little time as possible, and XFD=host_XFD|guest_XFD as much as possible.

Paolo