[PATCH 0/3] Reference count checker and cpu map related fixes

From: Ian Rogers
Date: Fri Jan 21 2022 - 23:58:25 EST


The first patch is a fix, the second a refactor/cleanup and the third
something of an RFC.

The perf tool has a class of memory problems where reference counts
are used incorrectly. Memory sanitizers and valgrind don't provide
useful ways to debug these problems, you see a memory leak where the
only pertinent information is the original allocation site. What would
be more useful is knowing where a get fails to have a corresponding
put, where there are double puts, etc.

This work was motivated by the roll-back of:
https://lore.kernel.org/linux-perf-users/20211118193714.2293728-1-irogers@xxxxxxxxxx/
where fixing a missed put resulted in a use-after-free in a different
context. There was a sense in fixing the issue that a game of
wac-a-mole had been embarked upon in adding missed gets and puts.

The basic approach of the change is to add a level of indirection at
the get and put calls. Get allocates a level of indirection that, if
no corresponding put is called, becomes a memory leak (and associated
stack trace) that leak sanitizer can report. Similarly if two puts are
called for the same get, then a double free can be detected by address
sanitizer. This can also detect the use after put, which should also
yield a segv without a sanitizer.

The patches demonstrates the approach on the reference counted
perf_cpu_map API. The associated struct only has two variables and so
the associated indirection is limited, and the API was recently
refactored and should be fresh of mind. To avoid two APIs the approach
calls the indirection layer perf_cpu_map when in effect. To control
the naming of the struct and the stripping of the indirection macros
are necessary. Patch 3 implements the checker and has diffstats of:
5 files changed, 146 insertions(+), 74 deletions(-)
The number of files changed is a reflection of the partial move to
libperf of this API and isn't typical. Overall reference count safety
is being added in at a cost of about 72 lines of code, hopefully
demonstrating the approach to be cheap enough that we can repeat it on
the other reference counted structs in perf - albeit that perf_cpu_map
is somewhat unusual in already using lots of accessor functions to
access the struct.

Did the approach spot any bugs? Not really. It did spot that in
perf_cpu_map__merge get was being used for the side-effect of
incrementing a reference count, but not returning the perf_cpu_map the
get returns. Having to be accurate with this kind of behavior is a
frustration of the approach, but it demonstrates the tool works and
for resolving reference count issues I feel it is worth it.

An alternative that was considered was ref_tracker:
https://lwn.net/Articles/877603/
ref_tracker requires use of a reference counted struct to also use a
cookie/tracker. The cookie is combined with data in a ref_tracker_dir
to spot double puts. When an object is finished with leaks can be
detected, as with this approach when leak analysis happens. This
approach was preferred as it doesn't introduce cookies, spots use
after put and appears moderately more neutral to the API. Weaknesses
of the implemented approcah are not being able to do adhoc leak
detection and a preference for adding an accessor API to structs. I
believe there are other issues, hence the RFC.

Ian Rogers (3):
perf python: Fix cpu_map__item building
perf cpumap: Migrate to libperf cpumap api
perf cpumap: Add reference count checking

tools/lib/perf/cpumap.c | 120 ++++++++++++------
tools/lib/perf/evsel.c | 4 +-
tools/lib/perf/include/internal/cpumap.h | 14 +-
tools/perf/bench/epoll-ctl.c | 2 +-
tools/perf/bench/epoll-wait.c | 2 +-
tools/perf/bench/evlist-open-close.c | 4 +-
tools/perf/bench/futex-hash.c | 2 +-
tools/perf/bench/futex-lock-pi.c | 2 +-
tools/perf/bench/futex-requeue.c | 2 +-
tools/perf/bench/futex-wake-parallel.c | 2 +-
tools/perf/bench/futex-wake.c | 2 +-
tools/perf/builtin-ftrace.c | 2 +-
tools/perf/builtin-stat.c | 7 +-
tools/perf/tests/bitmap.c | 4 +-
tools/perf/tests/cpumap.c | 20 ++-
tools/perf/tests/event_update.c | 8 +-
tools/perf/tests/mem2node.c | 4 +-
tools/perf/tests/mmap-basic.c | 5 +-
tools/perf/tests/topology.c | 37 +++---
tools/perf/util/auxtrace.c | 2 +-
tools/perf/util/counts.c | 2 +-
tools/perf/util/cpumap.c | 42 +++---
tools/perf/util/cpumap.h | 2 +-
tools/perf/util/cputopo.c | 4 +-
tools/perf/util/evlist-hybrid.c | 11 +-
tools/perf/util/evsel.c | 20 +--
tools/perf/util/evsel.h | 3 +-
tools/perf/util/mmap.c | 2 +-
tools/perf/util/perf_api_probe.c | 4 +-
tools/perf/util/pmu.c | 24 ++--
tools/perf/util/python.c | 6 +-
tools/perf/util/record.c | 6 +-
.../scripting-engines/trace-event-python.c | 4 +-
tools/perf/util/session.c | 4 +-
tools/perf/util/svghelper.c | 4 +-
tools/perf/util/synthetic-events.c | 18 +--
tools/perf/util/top.c | 6 +-
37 files changed, 245 insertions(+), 162 deletions(-)

--
2.35.0.rc0.227.g00780c9af4-goog