Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX

From: Will Deacon
Date: Tue Jan 18 2022 - 05:07:12 EST


On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
> order to add CONTEXT packets into the traces if the owner of the perf
> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.
>
> The value of the bit is computed in the arm_spe_event_to_pmscr function
> [1], and the check for capabilities happens in [2] in the pmu init
> callback. This suggests that the value of the CX bit should remain
> consistent for the duration of the perf session.
>
> However, the function arm_spe_event_to_pmscr may be called later during
> the start callback [3] when the "current" process is not the owner of
> the perf session, so the CX bit is currently not consistent. Consider
> the following example:
>
> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>
> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
> [3] 3806
>
> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>
> $ perf record -e arm_spe_0// -C0 -- sleep 1
> $ perf report -D | grep CONTEXT
> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2
> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2
> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2
> [...]
>
> As can be seen, the traces begin showing CONTEXT packets when the pid is
> 0xedf (3807).

So to be clear: we shouldn't be reporting these packets because 'perf'
doesn't have the right capabilities, but we evaluate that in the context of
'dd' (running as root) and so incorrectly grant the permission. Correct?

> This happens because the pmu start callback is run when
> the current process is not the owner of the perf session, so the CX
> register bit is set.

This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
also operate implicitly on the current task. How do other PMU drivers avoid
falling into this trap?

> One way to fix this is by caching the value of the CX bit during the
> initialization of the PMU event, so that it remains consistent for the
> duration of the session.

It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
initialisation -- wouldn't that allow perf to continue creating new events
with CX set, even if the paranoid sysctl was changed dynamically? Instead,
I think it would be better if the capabilities were stash in the event
itself somehow at initialisation time.

Will