Re: [PATCH 2/3] perf: arm_spe: Disable buffer before writing to PMBPTR_EL1 or PMBSR_EL1

From: James Clark
Date: Tue Jul 22 2025 - 10:50:57 EST




On 21/07/2025 4:21 pm, Leo Yan wrote:
On Mon, Jul 21, 2025 at 02:20:15PM +0100, James Clark wrote:

[...]

Thought about this some more.

Before:

arm_spe_pmu_buf_get_fault_act:
<drain buffer>
ISB
arm_spe_perf_aux_output_begin:
PMBLIMITR_EL1.E = 1
ISB
PMBSR_EL1.S = 0
ERET

Now:

PMBLIMITR_EL1 = 0
ISB

PMBSR_EL1.S = 0
arm_spe_perf_aux_output_begin:
ISB
PMBLIMITR_EL1.E = 1
ERET

I don't see much of a difference between the two sequences - the point after
which we can be sure that profiling is enabled remains the ERET from the
exception return. The only difference is that, before this change, the ERET
synchronized clearing the PMBSR_EL1.S bit, now it synchronizes setting the
PMBLIMITR_EL1.E bit.

Thoughts?

To make the discussion easier, I'll focus on the trace enabling flow
in this reply.

My understanding of a sane flow would be:

arm_spe_pmu_irq_handler() {
arm_spe_perf_aux_output_begin() {
SYS_PMBPTR_EL1 = base;

ISB // Synchronization between SPE register setting and
// enabling profiling buffer.
PMBLIMITR_EL1.E = 1;
}
ISB // Context synchronization event to ensure visibility to SPE
}

... start trace ... (Bottom half, e.g., softirq, etc)

ERET

In the current code, arm_spe_perf_aux_output_begin() is followed by an
ISB, which serves as a context synchronization event to ensure
visibility to the SPE. After that, it ensures that the trace unit will
function correctly.


But I think Alex's point is that in the existing code the thing that finally
enables trace (PMBSR_EL1.S = 0) isn't followed by an isb(), only an ERET. So
the new flow isn't any different in that regard.

Thanks for explanation.

I understand that the Software Usage PKLXF recommends using an ERET as
the synchronization point. However, between enabling the profiling
buffer and the ERET, the kernel might execute other operations (e.g.,
softirqs, tasklets, etc.).

Isn't preemption disabled? Surely that's not possible. Even if something did
run it wouldn't be anything that touches the SPE registers, and we're sure
there's an isb() between setting up the buffer and the final PMBLIMITR_EL1.E
= 1

Yes, bottom half runs in preemtion disabled state. See:

el1_interrupt() -> __el1_irq() -> irq_exit_rcu() -> invoke_softirq()

In some cases, sotfirq and tasklet might even cause long latency (I
believe it can be milliseconds or even longer), this is why ftrace
"irqsoff" tracer is used to profile latency caused by irq off state.

Bottom half does not modify SPE registsers, but it can be a long road
between enabling SPE trace and ERET.

Therefore, it seems to me that using ERET as the synchronization point
may be too late. This is why I think we should keep an ISB after
arm_spe_perf_aux_output_begin().

Wouldn't that make the ERET too late even in the current code then? But I
think we're agreeing there's no issue there?

I would say ERET is too late for current code as well.

Thanks,
Leo


Ok I get it now. The point is that there is stuff in between the return in the SPE IRQ handler and the actual ERET. If someone is interested in sampling the kernel then they might miss sampling a small amount of that.

It's not a correctness thing, just reducing potential gaps in the samples. I can add another commit to add it, but it doesn't look like it would be a fixes commit either, just an improvement. And the same issue applies to the existing code too.

James