Re: [PATCH v2 07/23] perf: arm_pmuv3: Introduce method to partition the PMU

From: Colton Lewis
Date: Tue Jun 24 2025 - 16:05:23 EST


Oliver Upton <oliver.upton@xxxxxxxxx> writes:

On Mon, Jun 23, 2025 at 06:26:42PM +0000, Colton Lewis wrote:
Oliver Upton <oliver.upton@xxxxxxxxx> writes:

> On Fri, Jun 20, 2025 at 10:13:07PM +0000, Colton Lewis wrote:
> > For PMUv3, the register field MDCR_EL2.HPMN partitiones the PMU
> > counters into two ranges where counters 0..HPMN-1 are accessible by
> > EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by
> > EL2.

> > Create module parameters partition_pmu and reserved_guest_counters to
> > reserve a number of counters for the guest. These numbers are set at
> > boot because the perf subsystem assumes the number of counters will
> > not change after the PMU is probed.

> > Introduce the function armv8pmu_partition() to modify the PMU driver's
> > cntr_mask of available counters to exclude the counters being reserved
> > for the guest and record reserved_guest_counters as the maximum
> > allowable value for HPMN.

> > Due to the difficulty this feature would create for the driver running
> > at EL1 on the host, partitioning is only allowed in VHE mode. Working
> > on nVHE mode would require a hypercall for every counter access in the
> > driver because the counters reserved for the host by HPMN are only
> > accessible to EL2.

> > Signed-off-by: Colton Lewis <coltonlewis@xxxxxxxxxx>
> > ---
> > arch/arm/include/asm/arm_pmuv3.h | 10 ++++
> > arch/arm64/include/asm/arm_pmuv3.h | 5 ++
> > drivers/perf/arm_pmuv3.c | 95 +++++++++++++++++++++++++++++-
> > include/linux/perf/arm_pmu.h | 1 +
> > 4 files changed, 109 insertions(+), 2 deletions(-)

> > diff --git a/arch/arm/include/asm/arm_pmuv3.h
> > b/arch/arm/include/asm/arm_pmuv3.h
> > index 2ec0e5e83fc9..9dc43242538c 100644
> > --- a/arch/arm/include/asm/arm_pmuv3.h
> > +++ b/arch/arm/include/asm/arm_pmuv3.h
> > @@ -228,6 +228,11 @@ static inline bool kvm_set_pmuserenr(u64 val)

> > static inline void kvm_vcpu_pmu_resync_el0(void) {}

> > +static inline bool has_vhe(void)
> > +{
> > + return false;
> > +}
> > +

> This has nothing to do with PMUv3, I'm a bit surprised to see you're
> touching 32-bit ARM. Can you just gate the whole partitioning thing on
> arm64?

The PMUv3 driver also has to compile on 32-bit ARM.

Quite aware.

My first series had the partitioning code in arch/arm64 but you asked me
to move it to the PMUv3 driver.

How are you suggesting I square those two requirements?

You should try to structure your predicates in such a way that the
partitioning stuff all resolves to false for 32 bit arm, generally. That
way we can avoid stubbing out silly things like has_vhe() which doesn't
make sense in the context of 32 bit.

Okay. I will do that. When I was reworking it I thought it looked weird
to have the predicates live in a different location than the main
partitioning function.

> > +static bool partition_pmu __read_mostly;
> > +static u8 reserved_guest_counters __read_mostly;
> > +
> > +module_param(partition_pmu, bool, 0);
> > +MODULE_PARM_DESC(partition_pmu,
> > + "Partition the PMU into host and guest VM counters [y/n]");
> > +
> > +module_param(reserved_guest_counters, byte, 0);
> > +MODULE_PARM_DESC(reserved_guest_counters,
> > + "How many counters to reserve for guest VMs [0-$NR_COUNTERS]");
> > +

> This is confusing and not what we discussed offline.

> Please use a single parameter that describes the number of counters used
> by the *host*. This affects the *host* PMU driver, KVM can discover (and
> use) the leftovers.

> If the single module parameter goes unspecified the user did not ask for
> PMU partitioning.

I understand what we discussed offline, but I had a dilemma.

If we do a single module parameter for number of counters used by the
host, then it defaults to 0 if unset and there is no way to distinguish
between no partitioning and a request for partitioning reserving 0
counters to the host which I also thought you requested. Would you be
happy leaving no way to specify that?

You can make the command line use a signed integer for storage and a
reset value of -1.

-1 would imply default behavior (no partitioning) and a non-negative
value would imply partitioning.

Good idea. I thought of that solution myself for the first time after I
logged off yesterday. Slightly embarrassed I didn't see it sooner :(

In any case, I think the usage is more self explainatory if
partitition=[y/n] is a separate bit.

What would be the user's intent of "partition_pmu=n reserved_guest_counters=$X"?

That doesn't make sense, which is a decent argument for using just one
parameter. I'm now fine with going back to just reserved_host_counters.