Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

From: Peter Zijlstra
Date: Tue Jun 17 2025 - 10:09:57 EST


On Tue, Jun 17, 2025 at 03:33:33PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 08:14:36PM +0800, Mi, Dapeng wrote:
>
> > > We're going to do a sane SIMD register set with variable width, and
> > > reclaim the XMM regs from the normal set.
> >
> > Ok, so we need to add two width variables like
> > sample_ext_regs_words_intr/user,
>
> s/ext/simd/
>
> Not sure it makes sense to have separate vector widths for kernel and
> user regs, but sure.
>
> > then reuse the XMM regs bitmap to represent the extend regs bitmap.
>
> But its not extended; its the normal bitmap.
>
> > Considering the OPMASK regs and APX
> > extended GPRs have same bit-width (64 bits), we may have to combine them
> > into a single bitmap, e.g. bits[15:0] represents R31~R16 and bits[23:16]
> > represents OPMASK7 ~ OPMASK0. 
>
> Again confused, bits 0:23 are the normal registers (in a lunatic
> order). The XMM regs are in 32:63 and will be free if the SIMD thing is
> present.
>
> SPP+APX should definitely go there.
>
> Not sure about OPMASK; those really do belong with the SIMD state. Let
> me go figure out what ARM and Risc-V look like in more detail.

So ARM-SVE has 32 vector registers with 16 predicate registers.

Risc-V Zv seems to only have 32 vector registers; no special purpose
predicate registers, instead a regular vector register can be used as a
predicate register.

PowerPC VSX has 64 vector registers and no predicate registers afaict.

While reading this, I came across the useful note that predicate
registers are 1/8-th the length of the vector registers (because the
minimal element is a byte). So while the current AVX-512 predicate
registers are indeed 64bits, this would no longer be true for the
hypothetical AVX-1024 (or even AVX-512 if we allow 4bit elements).

As such, I don't think we should stick the predicate registers in the
normal group -- they really are not normal registers and won't fit for
future extensions.

This then leaves us two options:

- stick the predicate registers in the high bits of the vector register
word, or

- add an explicit predicate register word.