Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation

From: Ingo Molnar
Date: Sat Aug 08 2015 - 05:15:22 EST



* Dave Hansen <dave@xxxxxxxx> wrote:

> I think we have three options. Here's some rough pseudo-ish-code to
> sketch them out.

> /* Option 2: search for the end of the last state, probably works, Ingo likes? */
> for (i = 0; i < nr_xstates; i++) {
> if (cpu_has_xsaves && !enabled_xstate(i))
> continue;
> end_of_state = xstate_offsets[i] + xstate_sizes[i];
> if (xstate_is_aligned[i]) /* currently not implemented */
> end_of_state = ALIGN(end_of_state, 64);
> if (end_of_state > total_blob_size)
> total_blob_size = end_of_state;
> }
> /* align unconditionally, maybe??? */
> total_blob_size = ALIGN(total_blob_size, 64);
>
> /* Double check our obviously bug-free math with what the CPU says */
> if (!cpu_has_xsaves)
> cpuid(0xD0, 0, &check_total_blob_size, ...);
> else
> cpuid(0xD0, 1, &check_total_blob_size, ...);
>
> WARN_ON(check_total_blob_size != total_blob_size);

Yes, this is quite close to what I'd like to see.

So I'd do the following things as well:

- For each xstate feature we enable, define a data structure on fpu/types.h, and
double check xstate_size against the sizeof() of that structure. Most of the
current state components are properly declared, but for example AVX512 is
missing. This documents the features nicely, and also makes it easy for KGDB
and tracers to print out the fields - even if in the normal codepaths we rely
on the CPU to handle this structure.

(This is the most important detail I am and was worried about all along.)

- If our calculations and that of the CPU's mismatch, pick the CPU's variant
instead.

- Use WARN_ONCE() instead of WARN_ON() to make life easier on 100+ core
prototypes.

- Remove the LWP bits, I don't think we'll ever support it, it's a broken
PMU framework.

- Detail: at least with AVX-1024 (which I'm sure we'll see one day), the natural
alignment of vector registers will rise from 64 bytes to 128 bytes. At that
point the beginning of the AVX-1024 buffer will likely be two cachelines
aligned and there might be more padding at the end of the AVX-512 area.
Make sure the alignment checking code handles this right.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/