Re: [PATCH 0/4] x86/fpu: Reduce unnecessary FNINIT and MXCSR usage

From: Krzysztof Olędzki
Date: Mon Jan 18 2021 - 03:33:34 EST


On 2021-01-17 at 22:20, Andy Lutomirski wrote:
This series fixes two regressions: a boot failure on AMD K7 and a
performance regression on everything.

I did a double-take here -- the regressions were reported by different
people, both named Krzysztof :)

Andy Lutomirski (4):
x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
x86/mmx: Use KFPU_MMX for MMX string operations
x86/fpu: Make the EFI FPU calling convention explicit
x86/fpu/64: Don't FNINIT in kernel_fpu_begin()

Thank you so much Andy!

What a coincidence! Sadly, my AMD K7 is sitting somewhere in a closet, on a different continent, and was running Linux for the last time over 10 years ago. :/ However, I can offer some testing on different AMD & Intel CPUs.

Now... It is 12 AM here so I tested it very quickly only on 5.4-stable, where I initially noticed the problem. The patch applies almost cleanly in this release, almost as arch/x86/platform/efi/efi_64.c does not have kernel_fpu_begin() call to update. The kernel complies and boots.

Here is the result for:
Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz (family: 0x6, model: 0x3a, stepping: 0x9)

5.4-stable (with "Reset MXCSR to default in kernel_fpu_begin"):
avx : 21072.000 MB/sec
prefetch64-sse: 20392.000 MB/sec
generic_sse: 18572.000 MB/sec
xor: using function: avx (21072.000 MB/sec)

5.4-stable-c4db485dd3f2378b4923503aed995f7816e265b7-revert:
avx : 33764.000 MB/sec
prefetch64-sse: 23432.000 MB/sec
generic_sse: 21036.000 MB/sec
xor: using function: avx (33764.000 MB/sec)

5.4-stable-kernel_fpu_begin_mask:
avx : 23576.000 MB/sec
prefetch64-sse: 23024.000 MB/sec
generic_sse: 20880.000 MB/sec
xor: using function: avx (23576.000 MB/sec)

So, the performance regression for prefetch64-sse and generic_sse is almost gone, but the AVX code is still impacted. Not as much as before, but still noticeably, and it is now barely better than fixed prefetch64-sse.

I'm going to test the patches on 5.10 / 5.11-rc to make sure what I have seen on 5.4 is not due to wrong backporting, and on different CPUs. However, this may have to wait until Tuesday / Wednesday due to family duties, as Monday is a holiday here.

Best regards,
Krzysztof Olędzki