Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

From: Linus Torvalds
Date: Tue Mar 20 2018 - 14:01:36 EST


On Tue, Mar 20, 2018 at 1:26 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> So assuming the target driver will only load on modern FPUs I *think* it should
> actually be possible to do something like (pseudocode):
>
> vmovdqa %ymm0, 40(%rsp)
> vmovdqa %ymm1, 80(%rsp)
>
> ...
> # use ymm0 and ymm1
> ...
>
> vmovdqa 80(%rsp), %ymm1
> vmovdqa 40(%rsp), %ymm0
>
> ... without using the heavy XSAVE/XRSTOR instructions.

No. The above is buggy. It may *work*, but it won't work in the long run.

Pretty much every single vector extension has traditionally meant that
touching "old" registers breaks any new register use. Even if you save
the registers carefully like in your example code, it will do magic
and random things to the "future extended" version.

So I absolutely *refuse* to have anything to do with the vector unit.
You can only touch it in the kernel if you own it entirely (ie that
"kernel_fpu_begin()/_end()" thing). Anything else is absolutely
guaranteed to cause problems down the line.

And even if you ignore that "maintenance problems down the line" issue
("we can fix them when they happen") I don't want to see games like
this, because I'm pretty sure it breaks the optimized xsave by tagging
the state as being dirty.

So no. Don't use vector stuff in the kernel. It's not worth the pain.

The *only* valid use is pretty much crypto, and even there it has had
issues. Benchmarks use big arrays and/or dense working sets etc to
"prove" how good the vector version is, and then you end up in
situations where it's used once per fairly small packet for an
interrupt, and it's actually much worse than doing it by hand.

Linus