Re: [PATCH v3 00/10] Move uid filtering to BPF filters

From: Namhyung Kim
Date: Tue Jun 03 2025 - 19:41:38 EST


On Tue, Jun 03, 2025 at 04:22:53PM -0700, Ian Rogers wrote:
> On Tue, Jun 3, 2025 at 3:32 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> > > On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > > > /proc scanning also misses processes starting after the perf
> > > > > > command. Add a helper for commands that support UID filtering and wire
> > > > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > > > >
> > > > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > > > > tmp.perf-tools-next.
> > > > > >
> > > > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > > > > system-wide for perf trace and perf record (Namhyung). Ensure the
> > > > > > uid filter isn't set on tracepoint evsels.
> > > > > >
> > > > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@xxxxxxxxxx/
> > > > >
> > > > > Ping. Thanks,
> > > >
> > > > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > > > remove it since some people don't use BPF. Can you please drop that
> > > > part and make parse_uid_filter() conditional on BPF?
> > >
> > > Hi Namhyung,
> > >
> > > The approach of scanning /proc fails as:
> > > 1) processes that start after perf starts will be missed,
> > > 2) processes that terminate between being scanned in /proc and
> > > perf_event_open will cause perf to fail (essentially the -u option is
> > > just sugar to scan /proc and then provide the processes as if they
> > > were a -p option - such an approach doesn't need building into the
> > > tool).
> >
> > Yeah, I remember we had this discussion before. I think (1) is not true
> > as perf events will be inherited to children (but there is a race).
>
> If you log in from another terminal? Anything that creates a new
> process for that user but isn't inherited will be missed, which isn't
> merely a race.

As long as the another terminal is owned by the same user, any new
process from the terminal will inherit events, no?

>
> > And
> > (2) is a real problem but it's also about a race and it can succeed.
> >
> > Maybe we could change it to skip failed events when the target is a
> > user but that's not the direction you want.
>
> We could have other events and try to discover new processes via them,
> do things like dummy events to cover races. It is just a lot of
> complexity for something that is a trivial amount of BPF. In something
> like 10 years nobody has bothered to fix this up.

I don't want any complex solution for this. Let's not touch this.

>
> > >
> > > This patch series adds a test [1] and perf test has lots of processes
> > > starting and exiting, matching condition (2) above*. If this series
> > > were changed to an approach that uses BPF and falls back on /proc
> > > scanning then the -u option would be broken for both reasons above but
> > > also prove a constant source of test flakes.
> > >
> > > Rather than give the users something both frustrating to use (keeps
> > > quitting due to failed opens) and broken (missing processes) I think
> > > it is better to quit perf at that point informing the user they need
> > > more permissions to load the BPF program. This also makes the -u
> > > option testable.
> > >
> > > So the request for a change I don't think is sensible as it provides a
> > > worse user and testing experience. There is also the cognitive load of
> > > having the /proc scanning code in the code base, whereas the BPF
> > > filter is largely isolated.
> >
> > But I think the problem is that it has different requirements - BPF and
> > root privilege. So it should be used after checking the requirements
> > and fail or fallback.
> >
> > Does it print proper error messages if not? With that we can deprecate
> > the existing behavior and remove it later.
>
> For `perf top` with TUI you get an error message in a box of:
> ```
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> With --stdio you get:
> ```
> libbpf: Error in bpf_object__probe_loading(): -EPERM. Couldn't load
> trivial BPF program. Make sure your kernel supports BPF
> (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> value.
> libbpf: failed to load object 'sample_filter_bpf'
> libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EPERM
> Failed to load perf sample-filter BPF skeleton
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> This matches the existing behavior if you put a filter on an event.

But that's different as user directly asked the BPF filter.
The following message would be better (unless you fallback to the old
behavior).

"-u/--uid option is using BPF filter but perf is not built with BPF.
Please make sure to build with libbpf and bpf skeletons."

and/or

"-u/--uid option is using BPF filter which requires root privilege."

You may check if the filter program and map is pinned already.

Thanks,
Namhyung