Re: [RFC PATCH 1/2] perf: arm_spe: Fix consistency of PMSCR register bit CX

From: James Clark
Date: Tue Jan 18 2022 - 11:28:37 EST




On 18/01/2022 10:07, Will Deacon wrote:
> On Mon, Jan 17, 2022 at 12:44:31PM +0000, German Gomez wrote:
>> The arm_spe_pmu driver will enable the CX bit of the PMSCR register in
>> order to add CONTEXT packets into the traces if the owner of the perf
>> event runs with capabilities CAP_PERFMON or CAP_SYS_ADMIN.
>>
>> The value of the bit is computed in the arm_spe_event_to_pmscr function
>> [1], and the check for capabilities happens in [2] in the pmu init
>> callback. This suggests that the value of the CX bit should remain
>> consistent for the duration of the perf session.
>>
>> However, the function arm_spe_event_to_pmscr may be called later during
>> the start callback [3] when the "current" process is not the owner of
>> the perf session, so the CX bit is currently not consistent. Consider
>> the following example:
>>
>> 1. Run a process in the background with capability CAP_SYS_ADMIN in CPU0.
>>
>> $ taskset --cpu-list 0 sudo dd if=/dev/random of=/dev/null &
>> [3] 3806
>>
>> 2. Begin a perf session _without_ capabilities (we shouldn't see CONTEXT packets).
>>
>> $ perf record -e arm_spe_0// -C0 -- sleep 1
>> $ perf report -D | grep CONTEXT
>> . 0000000e: 65 df 0e 00 00 CONTEXT 0xedf el2
>> . 0000004e: 65 df 0e 00 00 CONTEXT 0xedf el2
>> . 0000008e: 65 df 0e 00 00 CONTEXT 0xedf el2
>> [...]
>>
>> As can be seen, the traces begin showing CONTEXT packets when the pid is
>> 0xedf (3807).
>
> So to be clear: we shouldn't be reporting these packets because 'perf'
> doesn't have the right capabilities, but we evaluate that in the context of
> 'dd' (running as root) and so incorrectly grant the permission. Correct?

In my opinion we should be reporting context packets in this case. The only reason
a normal user is allowed to profile a root process is if they have
perf_event_paranoid <= 0. So from that perspective perf does have the right
capabilities. But a call to only perfmon_capable() doesn't look at the paranoid
value, only CAP_SYS_ADMIN and CAP_SYS_ADMIN.

>
>> This happens because the pmu start callback is run when
>> the current process is not the owner of the perf session, so the CX
>> register bit is set.
>
> This doesn't really seem SPE-specific to me -- the perf_allow_*() helpers
> also operate implicitly on the current task. How do other PMU drivers avoid
> falling into this trap?>
>> One way to fix this is by caching the value of the CX bit during the
>> initialization of the PMU event, so that it remains consistent for the
>> duration of the session.
>
> It doesn't feel right to stash this in 'struct arm_spe_pmu' during event
> initialisation -- wouldn't that allow perf to continue creating new events
> with CX set, even if the paranoid sysctl was changed dynamically? Instead,
> I think it would be better if the capabilities were stash in the event
> itself somehow at initialisation time.

If stashing is the issue then the change could be to check the right thing
every time, as in do something like the second patch in this set which
also checks perf_event_paranoid <= 0 (indirectly).

James

>
> Will
>