Re: [PATCH v5 24/28] x86/fpu/xstate: Use per-task xstate mask for saving xstate in signal frame

From: Len Brown
Date: Tue May 25 2021 - 10:04:57 EST


On Tue, May 25, 2021 at 12:48 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>
>
> On Mon, May 24, 2021, at 11:06 AM, Len Brown wrote:
> > On Sun, May 23, 2021 at 11:15 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > >
> > > If I'm reading this right, it means that tasks that have ever used AMX
> > > get one format and tasks that haven't get another one.
> >
> > No. The format of the XSTATE on the signal stack is uncompressed XSAVE
> > format for both AMX and non-AMX tasks, both before and after this patch.
> > That is because XSAVE gets the format from XCR0. It gets the fields
> > to write from the run-time parameter.
> >
> > So the change here allows a non-AMX task to skip writing data (zeros)
> > to the AMX region of its XSTATE buffer.
>
> I misread the patch. I still think this patch is useless.

This patch allows skipping writing 8KB of zeros in XSAVE, rather than
writing zeros.
This reduces both the cycle count and cache impact of context-switch.
Some might consider that useful, rather than useless.

> > The subsequent patch adds the further optimization of (manually) checking
> > for INIT state for an AMX-task and also skip writing data (zeros) in that case.
> >
> > We should have done this optimization for AVX-512, but instead we
> > guaranteed writing zeros, which I think is a waste of both transfer time
> > and cache footprint.
>
> If no one depends on it, it’s not ABI.

Agreed.
Perhaps in the future we can see if reducing AVX-512 cache footprint
this same way is beneficial.

--
Len Brown, Intel Open Source Technology Center