Re: RISCV Vector unit disabled by default for new task (was Re: [PATCH v12 17/17] riscv: prctl to enable vector commands)
From: Björn Töpel
Date: Thu Dec 15 2022 - 06:49:06 EST
Darius Rad <darius@xxxxxxxxxxxx> writes:
> On Wed, Dec 14, 2022 at 12:07:03PM -0800, Vineet Gupta wrote:
>> On 12/13/22 08:43, Darius Rad wrote:
>> > On Fri, Dec 09, 2022 at 11:42:19AM -0800, Vineet Gupta wrote:
>> > > But keeping the V unit disabled by default and using prctl as a gatekeeper
>> > > to enable it feels unnecessary and tedious.
>> > > Here's my reasoning below (I'm collating comments from prior msgs as well).
>> > Please reference the previous discussion [1] which has covered topics that
>> > have not been discussed recently.
>> >
>> > [1] https://lists.infradead.org/pipermail/linux-riscv/2021-September/thread.html#8361
>>
>> I sure read thru that thread, and many more :-) to get context.
>> The highlight is we should something because AVX/AMX do so (or failed to do
>> so).
>> But on the flip side ARM SVE is not disabling this by default.
>> Your other concerns seems to be potential power implications for leaving it
>> on and sharing of V unit across harts (see more on that below)
>> Maybe leaving it on all the time will be motivation for hw designers to be
>> more considerate of the idle power draw.
>>
>
> That is not quite the same take away I had from the AMX discussion. I
> would also disagree that ARM SVE is not disabling things by default,
> although the behavior is somewhat different.
>
> The significant point that I see from that discussion, which also applies
> to SVE, and also applies to RISC-V vector, is that the extension is
> necessarily dependant upon a functional unit that is reasonably large with
> respect to the size of the processor and has a significant amount of
> additional architectural state. The argument there is that AMX/SVE/RVV is
> a significant system resource and should be treated as such: by having the
> kernel track usage of it and by having a process specifically request
> access to it.
>
> For any of AMX/SVE/RVV, use of the extension operates as follows:
>
> 1. A process requests access to the extension,
>
> 2. The kernel allocates memory for the additional state for this process,
>
> 3. The kernel enables the extension for the process, and finally
>
> 4. The process is able to use the instructions.
>
> I don't recall the exact details, but my understanding is that AMX is going
> to use an x86 specific mechanism and require and explict request from user
> space to request access to AMX.
Yes, it uses arch_prctl, and on top of that a "lazy trigger" (AFAIK) as
SVE (first use enable via trap).
> For SVE, it is in fact disabled by default in the kernel. When a thread
> executes the first SVE instruction, it will cause an exception, the kernel
> will allocate memory for SVE state and enable TIF_SVE. Further use of SVE
> instructions will proceed without exceptions. Although SVE is disabled by
> default, it is enabled automatically. Since this is done automatically
> during an exception handler, there is no opportunity for memory allocation
> errors to be reported, as there are in the AMX case.
Glibc has an SVE optimized memcpy, right? Doesn't that mean that pretty
much all processes on an SVE capable system will enable SVE (lazily)? If
so, that's close to "enabled by default" (unless SVE is disabled system
wide).
> For RVV, I do not recall ever seeing Linux patches that automatically enable
> vector. I have seen it enabled unconditionally, or with a manual enable
> (i.e., prctl).
>
> It is possible to write a program that does not ever use AMX, and when that
> program is run, the process will not incur the power or memory overhead of
> AMX. It is also possible to do that with SVE. This is simply not possible
> if RISC-V will, by default for every process, enable and allocate state
> memory for vector.
>
> So my thought would be what is the motivation for being even less flexible
> than SVE, if you feel that the AMX mechanism is too onerous?
AMX is a bit different from SVE and V; SVE/V is/would be used by glibc
for memcpy and such, where I doubt that AMX would be used there. Then
again, there's AVX512 which many argue that "turned on by default" was a
mistake (ABI breakage/power consumption).
>> >
>> > > 2. People want the prctl gatekeeping for ability to gracefully handle memory
>> > > allocation failure for the extra V-state within kernel. But that is only
>> > > additional 4K (for typical 128 wide V regs) per task.
>> > But vector state scales up to as much as 256k. Are you suggesting that
>> > there is no possibility that future systems would support more than
>> > VLEN=128?
>>
>> I mentioned "typical". And below also said that memory allocation concerns
>> are moot, since fork/execve failures due to failing to allocate would take
>> care of those anyways.
>>
>
> But again, what if one were using such an admittedly atypical system? Why
> should such a user be compelled to take a memory hit for every process,
> even if they specifically go out of their way to avoid using vector
> instructions?
For the sake of discussion; Nobody is arguing against not having knobs
to turn V on/off per-process/per-system, right? The discussion is about
on/off, and broader what a "typical" RV system looks like. If most
systems that fold in the A profile has V, it might make sense not
requiring users to explicitly enable it, and vice-versa.
Using RVA23 as a ball-gazing aid, [1] states that it might mandate V. If
so, assuming that "most system will be designed for V usage" is not
crazy.
Now moving on! The thread is leaning towards "disabled by default" ("AMX
way"), let's try to move the discussion forward!
The Linux kernel itself would benefit from using V
(hashing/crypto). What kind of policy would determine if the kernel is
allowed to use V? Default off, with an explicit enable kernel knob
(cmdline/sysctl/sysfs/...)?
There will likely be V support in glibc (str*/mem*). For systems that
prefer having V "always-on", the UX of requiring all binaries to
explicitly call prctl() is not great (as Andrew pointed out in earlier
posts). A V knob based on some system policy in crt0? :-P
Björn
[1] https://lists.riscv.org/g/tech-profiles/message/48