Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

From: Mark Rutland
Date: Tue Jun 17 2025 - 10:56:05 EST


On Tue, Jun 17, 2025 at 04:44:16PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
>
> > TBH, I don't think we can handle extended state in a generic way unless
> > we treat this like a ptrace regset, and delegate the format of each
> > specific register set to the architecture code.
> >
> > On arm64, the behaviour is modal (with two different vector lengths for
> > streaming/non-streaming SVE when SME is implemented), per-task
> > configurable (with different vector lengths), can differ between
> > host/guest for KVM, and some of the registers only exist in some
> > configurations (e.g. the FFR only exists for SME if FA64 is
> > implemented).
>
> Well, much of this is per necessity architecture specific. But the
> general form of vector registers is similar enough.
>
> The main point is to not try and cram the vector registers into multiple
> GP regs (sadly that is exactly what x86 started doing).

I see, sorry for the noise. I completely agree that we shouldn't cram
this stuff into GP regs.

> Anyway, your conditional length thing is 'fun' and has two solutions:
>
> - the arch can refuse to create per-cpu counters with SIMD samples, or
>
> - 0 pad all 'unobtainable state'.
>
> Same when asking for wider vectors than the hardware supports; eg.
> asking for 512 wide registers on Intel clients will likely end up in a
> lot of 0s for the high bits -- seeing how AVX512 is mostly a server
> thing on Intel.

Yep, those options may work for us, but we'd need to think harder about
it. Our approach for ptrace and signals has been to have a header and
pack at the active vector length, so padding to a max width would be
different, but maybe it's fine.

Having another representation feels like a recipe waiting to happen.

Mark.