Re: [BUG] perf: memory leak in perf top

From: Arnaldo Carvalho de Melo
Date: Mon Aug 08 2016 - 09:29:50 EST


Em Mon, Aug 08, 2016 at 07:34:54PM +0800, Wangnan (F) escreveu:
>
>
> On 2016/8/5 22:12, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Aug 05, 2016 at 06:58:12PM +0800, Wangnan (F) escreveu:
> > > perf top comsumes too much memory than it need.
> > >
> > > Using following commands:
> > >
> > > # yes '' > /dev/null
> > > # perf top -e raw_syscalls:sys_enter
> > >
> > > System will quickly become unresponsive because of out of memory.
> > >
> > > Using GDB I find two call stacks of malloc. See below.
> > >
> > > [call stack 1]
> > >
> > > #0 __GI___libc_malloc (bytes=140723951319552) at malloc.c:2841
> > > #1 0x00000000004b7246 in memdup (src=0x7f515e04b054, len=68) at
> > > ../lib/string.c:29
> > > #2 0x00000000004f9299 in hist_entry__init (sample_self=true,
> > > template=0x7ffcd9213a80, he=0xa37da50) at util/hist.c:401
> > > #3 hist_entry__new (template=template@entry=0x7ffcd9213a80
> > > <mailto:%28template=template@entry=0x7ffcd9213a80>,
> > > sample_self=sample_self@entry=true)
> > > <mailto:sample_self=sample_self@entry=true%29> at util/hist.c:458
> > > #4 0x00000000004fa13f in hists__findnew_entry (hists=0x3549d30,
> > > hists=0x3549d30, al=0x7ffcd9213ca0, sample_self=true, entry=0x
> > > 7ffcd9213a80) at util/hist.c:544
> > > #5 __hists__add_entry (hists=0x3549d30, al=0x7ffcd9213ca0,
> > > sym_parent=<optimized out>, bi=bi@entry=0x0 <mailto:bi=bi@entry=0x0>,
> > > mi=mi@entry=0x0 <mailto:mi=mi@entry=0x0>, sampl
> > > e=<optimized out>, sample_self=sample_self@entry=true
> > > <mailto:sample_self=sample_self@entry=true>, ops=ops@entry=0x0)
> > > <mailto:ops=ops@entry=0x0%29> at util/hist.c:600
> > > #6 0x00000000004fa3d6 in hists__add_entry (sample_self=true,
> > > sample=<optimized out>, mi=0x0, bi=0x0, sym_parent=<optimized out
> > > > , al=<optimized out>, hists=<optimized out>) at util/hist.c:611
> > > #7 iter_add_single_normal_entry (iter=0x7ffcd9213ce0, al=<optimized out>) at
> > > util/hist.c:823
> > > #8 0x00000000004fc3c5 in hist_entry_iter__add (iter=0x7ffcd9213ce0,
> > > al=0x7ffcd9213ca0, max_stack_depth=<optimized out>, arg=0x
> > > 7ffcd9214120) at util/hist.c:1030
> > > #9 0x00000000004536d7 in perf_event__process_sample (machine=<optimized
> > > out>, sample=0x7ffcd9213d30, evsel=0x3549bb0, event=<o
> > > ptimized out>, tool=0x7ffcd9214120) at builtin-top.c:800
> > > #10 perf_top__mmap_read_idx (top=top@entry=0x7ffcd9214120
> > > <mailto:%28top=top@entry=0x7ffcd9214120>, idx=idx@entry=0)
> > > <mailto:idx=idx@entry=0%29> at builtin-top.c:866
> > > #11 0x00000000004559fd in perf_top__mmap_read (top=0x7ffcd9214120) at
> > > builtin-top.c:883
> > > #12 __cmd_top (top=0x7ffcd9214120) at builtin-top.c:1018
> > > #13 cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized
> > > out>) at builtin-top.c:1344
> > > #14 0x0000000000493561 in run_builtin (p=p@entry=0x948000
> > > <mailto:%28p=p@entry=0x948000> <commands+288>, argc=argc@entry=4
> > > <mailto:argc=argc@entry=4>, argv=argv@entry=0x7ffcd92199b0)
> > > <mailto:argv=argv@entry=0x7ffcd92199b0%29> at
> > > perf.c:357
> > > #15 0x0000000000437157 in handle_internal_command (argv=0x7ffcd92199b0,
> > > argc=4) at perf.c:419
> > > #16 run_argv (argv=0x7ffcd9219720, argcp=0x7ffcd921972c) at perf.c:465
> > > #17 main (argc=4, argv=0x7ffcd92199b0) at perf.c:609
> > >
> > > [call stack 2]
> > > #0 __GI___libc_malloc (bytes=140723951319456) at malloc.c:2841
> > > #1 0x00000000005766de in trace_seq_init (s=0x7ffcd92139b0) at trace-seq.c:60
> > > #2 0x00000000004f7047 in get_trace_output (he=0x7ffcd9213a80) at
> > > util/sort.c:584
> > Yeah, this allocates 4K, way too much, can you try with this patch,
> > which amounts to a trace_seq_trim()?
> >
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index 947d21f38398..37ddcf57500f 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -588,7 +588,7 @@ static char *get_trace_output(struct hist_entry *he)
> > } else {
> > pevent_event_info(&seq, evsel->tp_format, &rec);
> > }
> > - return seq.buffer;
> > + return realloc(seq.buffer, seq.len);
> > }
> > static int64_t
>
> It works: from 1.8g/10s to 40m/10s, but unfortunately your
> patch is wrong: please use realloc(seq.buffer, seq.len + 1)
> because of the ending '\0' :)

Ouch, I thought those were accounted in seq.len :-\ Will check and fix,
thanks for testing it.

> In my experiment RSS continuously increases because I use
> raw_syscalls, which can result in too much entries to record
> in perf top because of the nature of argument list (garbage
> arguments are also considered by perf). It becomes much better
> if I use --fields to reduce the number of possible entries.
> However, theoretically perf will eventually consume all
> system memory, unless we carefully choose tracepoints and
> fields to make sure the possible entries is limited.

I need to continue work on reference counting more classes, so that we
can trow away more stuff as we decay histogram entries...

- Arnaldo