Re: [PATCH v7 0/6] Introduce CET supervisor state support

From: Chao Gao
Date: Fri May 16 2025 - 05:03:25 EST


On Fri, May 16, 2025 at 09:51:50AM +0200, Uros Bizjak wrote:
>On Mon, May 12, 2025 at 10:57 AM Chao Gao <chao.gao@xxxxxxxxx> wrote:
>>
>> Dear maintainers and reviewers,
>>
>> I kindly request your consideration for merging this series. Most of
>> patches have received Reviewed-by/Acked-by tags.
>>
>> Thanks Chang, Rick, Xin, Sean and Dave for their help with this series.
>>
>> == Changelog ==
>> v6->v7:
>> - Collect reviews from Rick
>> - Tweak __fpstate_reset() to handle guest fpstate rather than adding a
>> guest-specific reset function (Sean & Dave)
>> - Fold xfd initialization into __fpstate_reset() (Sean)
>> - v6: https://lore.kernel.org/all/20250506093740.2864458-1-chao.gao@xxxxxxxxx/
>>
>> == Background ==
>>
>> CET defines two register states: CET user, which includes user-mode control
>> registers, and CET supervisor, which consists of shadow-stack pointers for
>> privilege levels 0-2.
>>
>> Current kernel disables shadow stacks in kernel mode, making the CET
>> supervisor state unused and eliminating the need for context switching.
>>
>> == Problem ==
>>
>> To virtualize CET for guests, KVM must accurately emulate hardware
>> behavior. A key challenge arises because there is no CPUID flag to indicate
>> that shadow stack is supported only in user mode. Therefore, KVM cannot
>> assume guests will not enable shadow stacks in kernel mode and must
>> preserve the CET supervisor state of vCPUs.
>>
>> == Solution ==
>>
>> An initial proposal to manually save and restore CET supervisor states
>> using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
>> its impact on KVM's ABI. Instead, leveraging the kernel's FPU
>> infrastructure for context switching was favored [1].
>
>Dear Chao,
>
>I wonder if the same approach can be used to optimize switching of
>Intel PT configuration context. There was a patch series [1] posted
>some time ago that showed substantial reduction of overhead when
>switching Intel PT configuration context on VM-Entry/Exit using
>XSAVES/XRSTORS instructions:

No, the guest-only infrastructure utilizes the FPU core to switch states
during context switches, whereas Intel PT state is switched at different
points, i.e., on VM entry/exit.

Switching Intel PT state on VM entry/exit is necessary only for the
"host-guest" mode, which is currently marked as BROKEN. Unless functional
issues are addressed first, there's no point in optimizing its state
switching.

If we ever reinstate support for the "host-guest" mode, I think Intel PT
state probably could be implemented as an independent feature, similar to
LBR state.