Re: [PATCH 06/17] KVM: arm64: Introduce method to partition the PMU

From: Colton Lewis
Date: Tue Jun 03 2025 - 17:32:57 EST


Oliver Upton <oliver.upton@xxxxxxxxx> writes:

On Mon, Jun 02, 2025 at 07:26:51PM +0000, Colton Lewis wrote:
static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
{
+ u8 hpmn = vcpu->kvm->arch.arm_pmu->hpmn;
+
preempt_disable();

/*
* This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
* to disable guest access to the profiling and trace buffers
*/
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
- *host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
+ vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
+ vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
+ MDCR_EL2_TPM |

This isn't safe, as there's no guarantee that kvm_arch::arm_pmu is
pointing that the PMU for this CPU. KVM needs to derive HPMN from some
per-CPU state, not anything tied to the VM/vCPU.

I'm confused. Isn't this function preparing to run the vCPU on this
CPU? Why would it be pointing at a different PMU?

And HPMN is something that we only want set when running a vCPU, so
there isn't any per-CPU state saying it should be anything but the
default value (number of counters) outside that context.

Unless you just mean I should check the number of counters again and
make sure HPMN is not an invalid value.

+/**
+ * kvm_pmu_partition() - Partition the PMU
+ * @pmu: Pointer to pmu being partitioned
+ * @host_counters: Number of host counters to reserve
+ *
+ * Partition the given PMU by taking a number of host counters to
+ * reserve and, if it is a valid reservation, recording the
+ * corresponding HPMN value in the hpmn field of the PMU and clearing
+ * the guest-reserved counters from the counter mask.
+ *
+ * Passing 0 for @host_counters has the effect of disabling partitioning.
+ *
+ * Return: 0 on success, -ERROR otherwise
+ */
+int kvm_pmu_partition(struct arm_pmu *pmu, u8 host_counters)
+{
+ u8 nr_counters;
+ u8 hpmn;
+
+ if (!kvm_pmu_reservation_is_valid(host_counters))
+ return -EINVAL;
+
+ nr_counters = *host_data_ptr(nr_event_counters);
+ hpmn = kvm_pmu_hpmn(host_counters);
+
+ if (hpmn < nr_counters) {
+ pmu->hpmn = hpmn;
+ /* Inform host driver of available counters */
+ bitmap_clear(pmu->cntr_mask, 0, hpmn);
+ bitmap_set(pmu->cntr_mask, hpmn, nr_counters);
+ clear_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
+ if (pmuv3_has_icntr())
+ clear_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
+
+ kvm_debug("Partitioned PMU with HPMN %u", hpmn);
+ } else {
+ pmu->hpmn = nr_counters;
+ bitmap_set(pmu->cntr_mask, 0, nr_counters);
+ set_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
+ if (pmuv3_has_icntr())
+ set_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
+
+ kvm_debug("Unpartitioned PMU");
+ }
+
+ return 0;
+}

Hmm... Just in terms of code organization I'm not sure I like having KVM
twiddling with *host* support for PMUv3. Feels like the ARM PMU driver
should own partitioning and KVM just takes what it can get.

Okay. I can move the code.

@@ -239,6 +245,13 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit()))
return;

+ if (reserved_host_counters) {
+ if (kvm_pmu_partition_supported())
+ WARN_ON(kvm_pmu_partition(pmu, reserved_host_counters));
+ else
+ kvm_err("PMU Partition is not supported");
+ }
+

Hasn't the ARM PMU been registered with perf at this point? Surely the
driver wouldn't be very pleased with us ripping counters out from under
its feet.

AFAICT nothing in perf registration cares about the number of counters
the PMU has. The PMUv3 driver tracks its own available counters through
cntr_mask and I modify that during partition.

Since this is still initialization of the PMU, I don't believe anything
has had a chance to use a counter yet that will be ripped away.

Aesthetically It makes since to change this if I move the partitioning
code to the PMUv3 driver, but I think it's inconsequential to the
function.