Re: [PATCH] perf map: Fix namespace memory leak

From: Ian Rogers
Date: Tue Nov 30 2021 - 12:20:45 EST


On Tue, Nov 30, 2021 at 7:00 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> Em Sun, Nov 28, 2021 at 04:40:26PM +0100, Jiri Olsa escreveu:
> > On Thu, Nov 18, 2021 at 11:37:14AM -0800, Ian Rogers wrote:
> > > This leak was happening reliably with test "Lookup mmap thread" with
> > > stack traces like:
> > >
> > > Direct leak of 5504 byte(s) in 172 object(s) allocated from:
> > > #0 0x7f4685e47987 in __interceptor_calloc
> > > #1 0x56063b974c2a in nsinfo__new util/namespaces.c:142
> > > #2 0x56063b9781ff in thread__new util/thread.c:70
> > > #3 0x56063b944953 in ____machine__findnew_thread util/machine.c:543
> > > #4 0x56063b944ac6 in __machine__findnew_thread util/machine.c:574
> > > #5 0x56063b944b36 in machine__findnew_thread util/machine.c:584
> > > #6 0x56063b94c892 in machine__process_fork_event util/machine.c:1954
> > > #7 0x56063b94cc1f in machine__process_event util/machine.c:2019
> > > #8 0x56063b894f18 in perf_event__process util/event.c:567
> > > #9 0x56063ba17951 in perf_tool__process_synth_event util/synthetic-events.c:65
> > > #10 0x56063ba19086 in perf_event__synthesize_fork util/synthetic-events.c:287
> > > #11 0x56063ba1c39d in __event__synthesize_thread util/synthetic-events.c:775
> > > #12 0x56063ba1cf6f in __perf_event__synthesize_threads util/synthetic-events.c:929
> > > #13 0x56063ba1d4ab in perf_event__synthesize_threads util/synthetic-events.c:1000
> > > #14 0x56063b821a3d in synth_all tests/mmap-thread-lookup.c:136
> > > #15 0x56063b821c86 in mmap_events tests/mmap-thread-lookup.c:174
> > > #16 0x56063b8221b7 in test__mmap_thread_lookup tests/mmap-thread-lookup.c:230
> > >
> > > The dso->nsinfo is overwritten, but without a nsinfo__put this can leak
> > > the overwritten nsinfo.
> > >
> > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> >
> > nice catch!
> >
> > Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>
>
> This one is tricky, I have to retrieve the details from last time this
> surfaced, but try:
>
> # perf top -F 10000
>
> leave it for a while and press 'q':
>
> perf: /home/acme/git/perf/tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
> Aborted (core dumped)
>
> I reproduced the above assertion a few times, now I'm not being able,
> probably some namespace related activity took place when it hit.

I was able to reproduce in about 1 run in 3. With address sanitizer
the core dump turns into a heap-use-after-free with a report of:

==235440==ERROR: AddressSanitizer: heap-use-after-free on address
0x6030000485c8 at pc 0x55d8b882bfb7 bp 0x7fffd3426e90 sp
0x7fffd3426e88
READ of size 4 at 0x6030000485c8 thread T0
#0 0x55d8b882bfb6 in __read_once_size ./tools/include/linux/compiler.h:132
#1 0x55d8b882bfb6 in atomic_read
./tools/include/asm/../../arch/x86/include/asm/atomic.h:28
#2 0x55d8b882c35c in refcount_sub_and_test
./tools/include/linux/refcount.h:123
#3 0x55d8b882c45e in refcount_dec_and_test
./tools/include/linux/refcount.h:148
#4 0x55d8b882d4af in nsinfo__put util/namespaces.c:204
#5 0x55d8b87adecd in __nsinfo__zput util/namespaces.h:65
#6 0x55d8b87b4bcc in dso__delete util/dso.c:1327
#7 0x55d8b87b4c8d in dso__put util/dso.c:1342
#8 0x55d8b87fb23b in dsos__purge util/machine.c:182
#9 0x55d8b87fb2e3 in dsos__exit util/machine.c:190
#10 0x55d8b87fb46e in machine__exit util/machine.c:222
#11 0x55d8b87fb715 in machines__exit util/machine.c:262
#12 0x55d8b88149d2 in perf_session__delete util/session.c:305
#13 0x55d8b85c0924 in cmd_top ./tools/perf/builtin-top.c:1781
#14 0x55d8b87176b7 in run_builtin ./tools/perf/perf.c:313
#15 0x55d8b8717c10 in handle_internal_command ./tools/perf/perf.c:365
#16 0x55d8b8717fcd in run_argv ./tools/perf/perf.c:409
#17 0x55d8b87187bc in main ./tools/perf/perf.c:539

0x6030000485c8 is located 24 bytes inside of 32-byte region
[0x6030000485b0,0x6030000485d0)
freed by thread T0 here:
#0 0x7fb6e7c454d7 in __interceptor_free
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0x55d8b882d41a in nsinfo__delete util/namespaces.c:192
#2 0x55d8b882d4bf in nsinfo__put util/namespaces.c:205
#3 0x55d8b87adecd in __nsinfo__zput util/namespaces.h:65
#4 0x55d8b87b4bcc in dso__delete util/dso.c:1327
#5 0x55d8b87b4c8d in dso__put util/dso.c:1342
#6 0x55d8b87fb23b in dsos__purge util/machine.c:182
#7 0x55d8b87fb2e3 in dsos__exit util/machine.c:190
#8 0x55d8b87fb46e in machine__exit util/machine.c:222
#9 0x55d8b87fb715 in machines__exit util/machine.c:262
#10 0x55d8b88149d2 in perf_session__delete util/session.c:305
#11 0x55d8b85c0924 in cmd_top ./tools/perf/builtin-top.c:1781
#12 0x55d8b87176b7 in run_builtin ./tools/perf/perf.c:313
#13 0x55d8b8717c10 in handle_internal_command ./tools/perf/perf.c:365
#14 0x55d8b8717fcd in run_argv ./tools/perf/perf.c:409
#15 0x55d8b87187bc in main ./tools/perf/perf.c:539

previously allocated by thread T11 here:
#0 0x7fb6e7c45987 in __interceptor_calloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x55d8b882ceaa in nsinfo__new util/namespaces.c:142
#2 0x55d8b883047f in thread__new util/thread.c:70
#3 0x55d8b87fcb93 in ____machine__findnew_thread util/machine.c:543
#4 0x55d8b87fcd06 in __machine__findnew_thread util/machine.c:574
#5 0x55d8b87fcd76 in machine__findnew_thread util/machine.c:584
#6 0x55d8b8804ad2 in machine__process_fork_event util/machine.c:1954
#7 0x55d8b8804e5f in machine__process_event util/machine.c:2019
#8 0x55d8b874cf18 in perf_event__process util/event.c:567
#9 0x55d8b88cfbd1 in perf_tool__process_synth_event
util/synthetic-events.c:65
#10 0x55d8b88d1306 in perf_event__synthesize_fork
util/synthetic-events.c:287
#11 0x55d8b88d461d in __event__synthesize_thread util/synthetic-events.c:775
#12 0x55d8b88d51ef in __perf_event__synthesize_threads
util/synthetic-events.c:929
#13 0x55d8b88d54af in synthesize_threads_worker util/synthetic-events.c:961
#14 0x7fb6e7b1eead in start_thread nptl/pthread_create.c:463

Thread T11 created by T0 here:
#0 0x7fb6e7bed716 in __interceptor_pthread_create
../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x55d8b88d5ddd in perf_event__synthesize_threads
util/synthetic-events.c:1039
#2 0x55d8b88da3b8 in __machine__synthesize_threads
util/synthetic-events.c:1791
#3 0x55d8b88da43d in machine__synthesize_threads
util/synthetic-events.c:1802
#4 0x55d8b85b9378 in __cmd_top ./tools/perf/builtin-top.c:1273
#5 0x55d8b85c0811 in cmd_top ./tools/perf/builtin-top.c:1774
#6 0x55d8b87176b7 in run_builtin ./tools/perf/perf.c:313
#7 0x55d8b8717c10 in handle_internal_command ./tools/perf/perf.c:365
#8 0x55d8b8717fcd in run_argv ./tools/perf/perf.c:409
#9 0x55d8b87187bc in main ./tools/perf/perf.c:539

SUMMARY: AddressSanitizer: heap-use-after-free
./tools/include/linux/compiler.h:132 in __read_once_size
Shadow bytes around the buggy address:
0x0c0680001060: fa fa fa fa fa fa fa fa fd fd fd fd fa fa fa fa
0x0c0680001070: fa fa fa fa fa fa fa fa fa fa fd fd fd fd fa fa
0x0c0680001080: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd fd
0x0c0680001090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c06800010a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c06800010b0: fa fa fa fa fa fa fd fd fd[fd]fa fa fa fa fa fa
0x0c06800010c0: fa fa fa fa fa fa fa fa fd fd fd fa fa fa fd fd
0x0c06800010d0: fd fa fa fa fd fd fd fd fa fa fd fd fd fa fa fa
0x0c06800010e0: fd fd fd fd fa fa fd fd fd fd fa fa fd fd fd fa
0x0c06800010f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fd fd
0x0c0680001100: fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==235440==ABORTING

My hope was to land this so that 'perf test' wouldn't have new
failures caused by leak sanitizer. So one option could be to guard the
change with an address sanitizer #ifdef. The larger fix is to rework
the reference counting logic, but I'd really hope we could switch that
work to being in a language like Rust to get us out of the current
memory leak vs crash wac-a-mole.

+Riccardo as he did some great and related fixes over the summer.

Thanks,
Ian



> - Arnaldo
>
> > thanks,
> > jirka
> >
> > > ---
> > > tools/perf/util/map.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
> > > index 8af693d9678c..ceed8f407bc0 100644
> > > --- a/tools/perf/util/map.c
> > > +++ b/tools/perf/util/map.c
> > > @@ -192,6 +192,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> > > if (!(prot & PROT_EXEC))
> > > dso__set_loaded(dso);
> > > }
> > > + nsinfo__put(dso->nsinfo);
> > > dso->nsinfo = nsi;
> > >
> > > if (build_id__is_defined(bid))
> > > --
> > > 2.34.0.rc2.393.gf8c9666880-goog
> > >
>
> --
>
> - Arnaldo