Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions

From: H. Peter Anvin
Date: Fri May 23 2008 - 13:05:57 EST

Roland McGrath wrote:
What I was doing in the RFC is: restore the state what ever that was
present and init the state that was not present in the stack frame.

That is consistent in spirit with the existing treatment of FPU data.
That is, if the sigcontext.fpstate pointer is NULL, the thread's FPU
state is reset to default. (And despite what hpa said about being
"supported", the facts in the code are that sigreturn just follows the
sigcontext.fpstate pointer, whatever it is. On 32-bit, the pointer is
NULL in the context saved when the thread had not used the FPU, so
modifying the sigcontext to include FP state when it didn't before
requires putting in some user-chosen pointer. There in fact may well be
existing code that does user-level coroutine switching using sigreturn
and relies on this, for all we know.)

Okay. Pretty much what it comes down to is that there is no ideal solution. Thus, we're trying to explore the potential tradeoffs. The scenario you describe above will crash horribly for a non-FXSAVE aware application running on an FXSAVE kernel.

Either way, there has been a long time since, and new bad applications have obviously emerged, partially "assisted" by our propensity to not document, and the deep gulf between our kernel and userspace developers.

Let's try another strawman on for size:

- It is clear it is desirable not just for the frame itself but for the fpstate to be self-describing.

- Thus, let's put a magic cookie in one of the reserved fields at the end of the FXSAVE region, and make sure it is long enough to be unlikely to pop up randomly; as well as another magic cookie outside the FXSAVE region.

- The signal delivery code will write the cookie (or zero, for !XSAVE) regardless of any crap ptrace might have written into it.

- We will ALSO set bit 0 in uc_flags for RT sigframes as an additional assurance.

- We will introduce at least a 32-bit field for future use, to be written unconditionally zero for now. We don't want to have to go through this particular torture yet again.

- The XSAVE state beyond the FXSAVE region needs to be self-describing. This may mean adding information not provided by the hardware. Furthermore, it must be possible for userspace to know the length of the frame, even if it doesn't understand its detailed contents.

None of this is foolproof on older kernels -- there simply *IS* no option for older kernels that is 100% guaranteed, thanks to various assumptions made and design decisions taken over the years. There are a couple of failure scenarious here:

- XSAVE-aware application running on pre-XSAVE kernel:

Such an application will be aware that the XSAVE information may not
exist, but needs to know (with high probability) that it isn't
present. We have CPUID.OSXSAVE, the uc_flags bit, and the magic
cookie to help here. ptrace can introduce the magic cookie falsely
into the state, but ptrace can introduce all kinds of failures;
either way they would (probablistically) not see the *second*

The fact that 64-bit kernels don't clear the unused fields is of
less concern, since 64-bit kernels get the uc_flags field.

- Pre-XSAVE application running on XSAVE kernel with XSAVE enabled:

Here we have the potential for all kinds of corrupt state, including
userspace trying to save away the state and load it later, not
knowing the proper size of it. Worse, some sick person might try to
save and restore state from different hosts, with potential for
all kinds of mayhem.

The saved state, if copied from the original, would contain the first
cookie but not the second cookie.

Again, the use of two cookies here adds some amount of assurance;
but that again amounts to probabilistic failure detection. However,
I personally don't see any way to avoid that scenario at all.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at