Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions

From: Suresh Siddha
Date: Fri May 23 2008 - 13:50:29 EST

On Fri, May 23, 2008 at 09:57:32AM -0700, H. Peter Anvin wrote:
> Roland McGrath wrote:
> >>What I was doing in the RFC is: restore the state what ever that was
> >>present and init the state that was not present in the stack frame.
> >
> >That is consistent in spirit with the existing treatment of FPU data.
> >That is, if the sigcontext.fpstate pointer is NULL, the thread's FPU
> >state is reset to default. (And despite what hpa said about being
> >"supported", the facts in the code are that sigreturn just follows the
> >sigcontext.fpstate pointer, whatever it is. On 32-bit, the pointer is
> >NULL in the context saved when the thread had not used the FPU, so
> >modifying the sigcontext to include FP state when it didn't before
> >requires putting in some user-chosen pointer. There in fact may well be
> >existing code that does user-level coroutine switching using sigreturn
> >and relies on this, for all we know.)
> >
> Okay. Pretty much what it comes down to is that there is no ideal
> solution. Thus, we're trying to explore the potential tradeoffs. The
> scenario you describe above will crash horribly for a non-FXSAVE aware
> application running on an FXSAVE kernel.

yes, who ever added fxsave extensions missed this and we def want to handle
these kind of scenarios in xsave context.

> Either way, there has been a long time since, and new bad applications
> have obviously emerged, partially "assisted" by our propensity to not
> document, and the deep gulf between our kernel and userspace developers.
> Let's try another strawman on for size:
> - It is clear it is desirable not just for the frame itself but for the
> fpstate to be self-describing.


> - Thus, let's put a magic cookie in one of the reserved fields at the
> end of the FXSAVE region, and make sure it is long enough to be unlikely
> to pop up randomly; as well as another magic cookie outside the FXSAVE
> region.
> - The signal delivery code will write the cookie (or zero, for !XSAVE)
> regardless of any crap ptrace might have written into it.
> - We will ALSO set bit 0 in uc_flags for RT sigframes as an additional
> assurance.
> - We will introduce at least a 32-bit field for future use, to be
> written unconditionally zero for now. We don't want to have to go
> through this particular torture yet again.

This is exactly what I have been trying to say. CPU folks are going
to document that bytes[464..511] of fxsave frame are going to be
SW usable. We can use some of these to represent the cookie, size
of the xsave frame, state that is saved in the frame etc. And the
remaining unused bytes can be cleared to zero for future use.

you say:

> as well as another magic cookie outside the FXSAVE region.

Is this to avoid issues with memcpy, who end up copying only fxsave
info but not the extended state? We don't need another magic
cookie outside the fxsave region, we can include the
address of the _fpstate pointer along side of the first cookie
in the SW usable portion of the fxsave state.

> - The XSAVE state beyond the FXSAVE region needs to be self-describing.
> This may mean adding information not provided by the hardware.

In my RFC, I have added size of the xsave, state mask saved in xsave
frame. I will move this to the sw usable portion of fxsave. We can
add any thing more if desired.

> Furthermore, it must be possible for userspace to know the length of the
> frame, even if it doesn't understand its detailed contents.

With the above, we are just changing the _fpstate layout (which
includes the length). But if we need frame size in the
sigframe layout, we can add it (and it can be done outside the scope of xsave

> None of this is foolproof on older kernels -- there simply *IS* no
> option for older kernels that is 100% guaranteed, thanks to various
> assumptions made and design decisions taken over the years. There are a
> couple of failure scenarious here:
> - XSAVE-aware application running on pre-XSAVE kernel:
> Such an application will be aware that the XSAVE information may not
> exist, but needs to know (with high probability) that it isn't
> present. We have CPUID.OSXSAVE, the uc_flags bit, and the magic
> cookie to help here. ptrace can introduce the magic cookie falsely
> into the state, but ptrace can introduce all kinds of failures;
> either way they would (probablistically) not see the *second*
> cookie.

Application can use cpuid.OSXSAVE (even in
paravirtualization, hypervisors will prevent the leak of
cpuid.osxsave for kernels which don't use it) reliably. And also,
we have uc_flags (for rt-frame) and other cookies in the SW usable
portion of fxsave frame.

In all the cases, kernel should be as graceful as it can be. xsave
aware kernel should try to find and restore xsave state and if there
is any failure, try to restore just the fpstate and init the rest.

> The fact that 64-bit kernels don't clear the unused fields is of
> less concern, since 64-bit kernels get the uc_flags field.
> - Pre-XSAVE application running on XSAVE kernel with XSAVE enabled:
> Here we have the potential for all kinds of corrupt state, including
> userspace trying to save away the state and load it later, not
> knowing the proper size of it. Worse, some sick person might try to
> save and restore state from different hosts, with potential for
> all kinds of mayhem.
> The saved state, if copied from the original, would contain the first
> cookie but not the second cookie.
> Again, the use of two cookies here adds some amount of assurance;

I think the pointer along with the first cookie is enough right? There
is no SW usable portion in the xsave layout outside the fxsave area.
And for compatibility reasons, its best to keep the xsave layout
similar to what the cpu uses.

> but that again amounts to probabilistic failure detection. However,
> I personally don't see any way to avoid that scenario at all.

In this case, kernel will atleast not corrupt the fxsave state, which
the application cares about.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at