Re: [Patch v3 16/22] perf/core: Support to capture higher width vector registers

From: Peter Zijlstra
Date: Wed Apr 16 2025 - 11:53:42 EST


On Wed, Apr 16, 2025 at 02:42:12PM +0800, Mi, Dapeng wrote:

> Just think twice, using bitmap to represent these extended registers indeed
> wastes bits and is hard to extend, there could be much much more vector
> registers if considering AMX.

*Groan* so AMX should never have been register state :-(


> Considering different arch/HW may support different number vector register,
> like platform A supports 8 XMM registers and 8 YMM registers, but platform
> B only supports 16 XMM registers, a better way to represent these vector
> registers may add two fields, one is a bitmap which represents which kinds
> of vector registers needs to be captures. The other field could be a u16
> array which represents the corresponding register length of each kind of
> vector register. It may look like this.
>
> #define    PERF_SAMPLE_EXT_REGS_XMM    BIT(0)
> #define    PERF_SAMPLE_EXT_REGS_YMM    BIT(1)
> #define    PERF_SAMPLE_EXT_REGS_ZMM    BIT(2)

>     __u32    sample_regs_intr_ext;
>     __u16    sample_regs_intr_ext_len[4];
>     __u32    sample_regs_user_ext;
>     __u16    sample_regs_user_ext_len[4];
>
>
> Peter, how do you think this? Thanks.

I'm not entirely sure I understand.

How about something like:

__u16 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;

Then the simd_reg_words tell us how many (quad) words per register (8 for
512) and simd_reg_{intr,user} are a simple bitmap, one bit per actual
simd reg.

So then all of XMM would be:

words = 2;
intr = user = 0xFFFF;

(16 regs, 128 wide)

Whereas ZMM would be:

words = 8
intr = user = 0xFFFFFFFF;

(32 regs, 512 wide)


Would this be sufficient? Possibly we can split the words thing into two
__u8, but does it make sense to ask for different vector width for
intr and user ?