Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

From: Liang, Kan
Date: Wed Jun 18 2025 - 06:16:30 EST




On 2025-06-18 5:35 a.m., Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 04:32:24PM -0400, Liang, Kan wrote:
>
>>> Yep, those options may work for us, but we'd need to think harder about
>>> it. Our approach for ptrace and signals has been to have a header and
>>> pack at the active vector length, so padding to a max width would be
>>> different, but maybe it's fine.
>>>
>>> Having another representation feels like a recipe waiting to happen.
>>>
>>
>> I'd like to make sure I understand correctly.
>> If we'd like an explicit predicate register word, the below change in
>> struct perf_event_attr is OK for ARM as well, right?
>>
>> __u16 sample_simd_pred_reg_words;
>> __u16 sample_simd_pred_reg_intr;
>> __u16 sample_simd_pred_reg_user;
>> __u16 sample_simd_reg_words;
>> __u64 sample_simd_reg_intr;
>> __u64 sample_simd_reg_user;
>>
>> BTW: would that be easier for ARM if changing the _words to _type?
>> You may define some types like, stream_sve, n_stream_sve, etc.
>> The output will depend on the types, rather than the max length of
>> registers.
>
> I'm thinking what they're after is something like:
>
> PERF_SAMPLE_SIMD_REGS := {
> u16 nr_vectors;
> u16 vector_length;
> u16 nr_pred;
> u16 pred_length;
> u64 data[];
> }

Maybe we should use a mask to replace the nr_vectors.
Because Dave mentioned that the XSAVES may fail.
Currently, perf gives all 0 for the failing case. But 0 should also be a
valid output.
The mask can tell the tool that some regs are failed to be collected. So
the tool can give proper feedback to the end user.

PERF_SAMPLE_SIMD_REGS := {
u64 vectors_mask;
u16 vector_length;
u64 pred_mask;
u16 pred_length;
u64 data[];
}

Thanks,
Kan>
> Where the output data also has a length. Such that even if we ask for
> 512 bit vectors, the thing is allowed to respond with say 128 bit
> vectors if that is all the machine has at that time.
>