Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

From: Liang, Kan
Date: Tue Jun 17 2025 - 16:32:40 EST




On 2025-06-17 10:55 a.m., Mark Rutland wrote:
> On Tue, Jun 17, 2025 at 04:44:16PM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
>>
>>> TBH, I don't think we can handle extended state in a generic way unless
>>> we treat this like a ptrace regset, and delegate the format of each
>>> specific register set to the architecture code.
>>>
>>> On arm64, the behaviour is modal (with two different vector lengths for
>>> streaming/non-streaming SVE when SME is implemented), per-task
>>> configurable (with different vector lengths), can differ between
>>> host/guest for KVM, and some of the registers only exist in some
>>> configurations (e.g. the FFR only exists for SME if FA64 is
>>> implemented).
>>
>> Well, much of this is per necessity architecture specific. But the
>> general form of vector registers is similar enough.
>>
>> The main point is to not try and cram the vector registers into multiple
>> GP regs (sadly that is exactly what x86 started doing).
>
> I see, sorry for the noise. I completely agree that we shouldn't cram
> this stuff into GP regs.
>
>> Anyway, your conditional length thing is 'fun' and has two solutions:
>>
>> - the arch can refuse to create per-cpu counters with SIMD samples, or
>>
>> - 0 pad all 'unobtainable state'.
>>
>> Same when asking for wider vectors than the hardware supports; eg.
>> asking for 512 wide registers on Intel clients will likely end up in a
>> lot of 0s for the high bits -- seeing how AVX512 is mostly a server
>> thing on Intel.
>
> Yep, those options may work for us, but we'd need to think harder about
> it. Our approach for ptrace and signals has been to have a header and
> pack at the active vector length, so padding to a max width would be
> different, but maybe it's fine.
>
> Having another representation feels like a recipe waiting to happen.
>

I'd like to make sure I understand correctly.
If we'd like an explicit predicate register word, the below change in
struct perf_event_attr is OK for ARM as well, right?

__u16 sample_simd_pred_reg_words;
__u16 sample_simd_pred_reg_intr;
__u16 sample_simd_pred_reg_user;
__u16 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;

BTW: would that be easier for ARM if changing the _words to _type?
You may define some types like, stream_sve, n_stream_sve, etc.
The output will depend on the types, rather than the max length of
registers.

Thanks,
Kan