Re: [PATCH v8 0/6] Introduce CET supervisor state support
From: Chao Gao
Date: Tue Jun 03 2025 - 02:22:55 EST
On Mon, Jun 02, 2025 at 12:12:51PM -0700, Chang S. Bae wrote:
>== Preempt Case ==
>
>To illustrate how the XFD MSR state becomes incorrect in this scenario:
>
> task #1 (fpstate->xfd=0) task #2 (fpstate->xfd=0x80000)
> ======================== ==============================
> handle_signal()
> -> setup_rt_frame()
> -> get_siframe()
> -> copy_fpstate_to_sigframe()
> -> fpregs_unlock()
> ...
> ...
> switch_fpu_return()
> -> fpregs_restore_userregs()
> -> restore_fpregs_from_fpstate()
> -> xfd_write_state()
> ^ IA32_XFD_MSR = 0
> ...
> ...
> -> fpu__clear_user_states()
> -> fpregs_lock()
> -> restore_fpregs_from_init_fpstate()
> -> os_rstor()
> -> xfd_validate_state()
> ^ IA32_XFD_MSR != fpstate->xfd
> -> fpregs_mark_active()
> -> fpregs_unlock()
>
>Since fpu__clear_user_states() marks the FPU state as valid in the end, an
>XFD MSR sync-up was clearly missing.
Thanks for this analysis. It makes sense.
>
>== Return-to-Userspace Path ==
>
>Both tasks at that moment are on the return-to-userspace path, but at
>different points in IRQ state:
>
> * task #2 is inside handle_signal() and already re-enabled IRQs.
> * task #1 is after IRQ is disabled again when calling
> switch_fpu_return().
>
> local_irq_disable_exit_to_user()
> exit_to_user_mode_prepare()
> -> exit_to_user_mode_loop()
> -> local_irq_enable_exit_to_user()
> -> arch_do_signal_or_restart()
> -> handle_signal()
> -> local_irq_disable_exit_to_user()
> -> arch_exit_user_mode_prepare()
> -> arch_exit_work()
> -> switch_fpu_return()
>
>This implies that fpregs_lock()/fpregs_unlock() is necessary inside
>handle_signal() when XSAVE instructions are invoked.
>
>But, it should be okay for switch_fpu_return() to call
>fpregs_restore_userregs() without fpregs_lock().
>
>== XFD Sanity Checker ==
>
>The XFD sanity checker -- xfd_op_valid() -- correctly caught this issue in
>the test case. However, it may have a false negative when AMX usage was
>flipped between the two tasks.
>
>Despite that, I don't think extending its coverage is worthwhile, as it would
>complicate the logic. The current logic and documentation seem sound.
>
>== Fix Consideration ==
>
>I think the fix is straightforward: resynchronize the IA32_XFD MSR in
>fpu__clear_user_states().
This fix sounds good.
Btw, what do you think the impact of this issue might be?
Aside from the splat, task #2 could execute AMX instructions without
requesting permissions, but its AMX state would be discarded during the
next FPU switch, as RFBM[18] is cleared when executing XSAVES. And, in the
"flipped" scenario you mentioned, task #2 might receive an extra #NM, after
which its fpstate would be re-allocated (although the size won't increase
further).
So, for well-behaved tasks that never use AMX, there is no impact; tasks
that use AMX may receive extra #NM. There won't be any unexpected #PF,
memory corruption, or kernel panic.