Re: [PATCH v3 01/13] x86/fpu/xstate: Avoid getting xstate address of init_fpstate if fpstate contains the component

From: Chang S. Bae
Date: Fri Feb 24 2023 - 19:47:47 EST


On 2/24/2023 3:56 PM, Mingwei Zhang wrote:
On Wed, Feb 22, 2023, Chang S. Bae wrote:

/*
- * The ptrace buffer is in non-compacted XSAVE format. In
- * non-compacted format disabled features still occupy state space,
- * but there is no state to copy from in the compacted
- * init_fpstate. The gap tracking will zero these states.
+ * Indicate which states to copy from fpstate. When not present in
+ * fpstate, those extended states are either initialized or
+ * disabled. They are also known to have an all zeros init state.
+ * Thus, remove them from 'mask' to zero those features in the user
+ * buffer instead of retrieving them from init_fpstate.
*/
- mask = fpstate->user_xfeatures;

Do we need to change this line and the comments? I don't see any of
these was relevant to this issue. The original code semantic is to
traverse all user_xfeatures, if it is available in fpstate, copy it from
there; otherwise, copy it from init_fpstate. We do not assume the
component in init_fpstate (but not in fpstate) are all zeros, do we? If
it is safe to assume that, then it might be ok. But at least in this
patch, I want to keep the original semantics as is without the
assumption.

Here it has [1]:

*
* XSAVE could be used, but that would require to reshuffle the
* data when XSAVEC/S is available because XSAVEC/S uses xstate
* compaction. But doing so is a pointless exercise because most
* components have an all zeros init state except for the legacy
* ones (FP and SSE). Those can be saved with FXSAVE into the
* legacy area. Adding new features requires to ensure that init
* state is all zeroes or if not to add the necessary handling
* here.
*/
fxsave(&init_fpstate.regs.fxsave);

Thus, init_fpstate has zeros for those extended states. Then, copying from init_fpstate is the same as membuf_zero() by the gap tracking. But, we have two ways to do the same thing here.

So I think it works that simply copying the state from fpstate only for those present there, then letting the gap tracking zero out for the rest of the userspace buffer for features that are either disabled or initialized.

Then, we can remove accessing init_fpstate in the copy loop and which is the source of the problem. So I think this line change is relevant and also makes the code simple.

I guess I'm fine if you don't want to do this. Then, let me follow up with something like this at first. Something like yours could be a fallback option for other good reasons, otherwise.

Thanks,
Chang

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n386