Re: [Patch v3 16/22] perf/core: Support to capture higher width vector registers
From: Peter Zijlstra
Date: Wed Apr 16 2025 - 11:53:42 EST
On Wed, Apr 16, 2025 at 02:42:12PM +0800, Mi, Dapeng wrote:
> Just think twice, using bitmap to represent these extended registers indeed
> wastes bits and is hard to extend, there could be much much more vector
> registers if considering AMX.
*Groan* so AMX should never have been register state :-(
> Considering different arch/HW may support different number vector register,
> like platform A supports 8 XMM registers and 8 YMM registers, but platform
> B only supports 16 XMM registers, a better way to represent these vector
> registers may add two fields, one is a bitmap which represents which kinds
> of vector registers needs to be captures. The other field could be a u16
> array which represents the corresponding register length of each kind of
> vector register. It may look like this.
>
> #define PERF_SAMPLE_EXT_REGS_XMM BIT(0)
> #define PERF_SAMPLE_EXT_REGS_YMM BIT(1)
> #define PERF_SAMPLE_EXT_REGS_ZMM BIT(2)
> __u32 sample_regs_intr_ext;
> __u16 sample_regs_intr_ext_len[4];
> __u32 sample_regs_user_ext;
> __u16 sample_regs_user_ext_len[4];
>
>
> Peter, how do you think this? Thanks.
I'm not entirely sure I understand.
How about something like:
__u16 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;
Then the simd_reg_words tell us how many (quad) words per register (8 for
512) and simd_reg_{intr,user} are a simple bitmap, one bit per actual
simd reg.
So then all of XMM would be:
words = 2;
intr = user = 0xFFFF;
(16 regs, 128 wide)
Whereas ZMM would be:
words = 8
intr = user = 0xFFFFFFFF;
(32 regs, 512 wide)
Would this be sufficient? Possibly we can split the words thing into two
__u8, but does it make sense to ask for different vector width for
intr and user ?