Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)

From: Namhyung Kim
Date: Thu Nov 05 2015 - 06:33:35 EST


Hi Arnaldo,

On Wed, Nov 04, 2015 at 03:08:58PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu:
> > Hi Arnaldo and Brendan,
> >
> > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > Ah, makes sense. So it'd look like
> > >
> > > > > $ perf report --stdio -g folded,count,info -F none -s comm
> > > > > $ perf report --stdio -g folded,count,info -F none -s pid
> > >
> > > > > The output would be
> > >
> > > > > 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> > >
> > > > Thanks, looks almost right: a couple of minor changes:
> > >
> > > > 1. If perf already has the precedent of "PID:comm", instead of my
> > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > > > Doesn't make much difference to me.
>
> > Right. Actually I'd like to write it that way.. ;-)
>
> Well, those are two pieces of information: "comm" and "pid", so it would
> be nice that we could take this opportunity to remove it, i.e. just
> treat it as any other field and separate it via the designated
> separator, and only show the ones specified.

So do you want to change '-s pid' to print 'PID' part only?


>
> > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > > > I'm nervous about using space as a delimiter any more than once, since
> > > > it can also appear in comm (eg, "java main") and frames (eg,
> > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > > > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > > > it a semicolon:
>
> The C++ symbol names are the biggest challenge here for a single line in
> CSV ("comma" quoted) record :-\
>
> > Fair enough.
>
> > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> > >
> > > > So the output is "value key", and key is a semicolon delimited stack
> > > > with an optional comm or PID:comm frame at the start.
> > >
> > > Agreed, but then, we can have some sort of default and also be able to,
> > > using -F, specify what are the fields we want, and in which order, and I
> > > liked your suggestion of being able to specify "-F none" and that mean
> > > no hist line to be produced.
> > >
> > > Likewise, the way that each callchain line should be formatted should be
> > > programmable via the command line, via the -g option, no? Then script
> > > writers could use it in a way that doesn't requires further processing,
> > > as Brendan showed.
> >
> > Right. So '-s <key1>[,<key2>,...] -g info' can control which info is
> > displayed along with the callchains.
>
> So you force the same selection of fields to be used for both the
> hist_entry and the callchains?

Yes.


>
> And why is that some of the fields will be selected via -s (comm, dso)
> and other fields will be selected via -g (count, this "info" thing)?

Because it affects how hist entries are aggregated..


>
> Why not be flexible and allow any set of fields to be used in both
> cases, without one being tied to the other?
>
> I.e. instead of:
>
> -s <key1>[,<key2>,...] -g info
>
> We use:
>
> -s <key1>[,<key2>,...] -g [<keyA>[,<keyB>],...]

But then we need to aggregate hist entries using all of key1, key2,
keyA, keyB and so on. Otherwise callchain info with keyA and keyB
might be stale.

If so, we need to group hist entries again using key1 and key2 only
for printing hist part. For example, entries for (1,2,A,B) and
(1,2,C,D) should be shown as single entry for (1,2).

I think this 'info' part is only needed when hist entries are omitted
(i.e. -F none). If so, no need to bother with new options..

>
> If one would want to have the same set for both, then yeah, a keyword
> for that would be interesting, reusing your "info":
>
> -s <key1>[,<key2>,...] -g info
>
> Would mean:
>
> -s <key1>[,<key2>,...] -g [<key1>[,<key2>],...]
>
> With both ... equal
>
> But "info" is way too vague, perhaps "hist_keys", or something more
> compact, like: "\-s", to reuse the semantic of regular expression groups
> (\1).

I prefer "hist_keys".


>
> > $ perf report -s comm,dso -g folded,count,info -F none
> > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
>
> > Note that the info part (swapper;[kernel.vmlinux]) is also separated
> > by a semicolon. But I think it's ok since it's controlled by command
> > line, so script can know how many entries will be.
>
> > > But yeah, the value is the semicolon delimited stack all the way to the
> > > comm/PID:comm if there are more than one or if the user asks it to be
> > > there via a -g keyword, all the other counts/info are just relative to
> > > that, CSV or whatever other delimiter the user asks it to, and space is
> > > not an option, as we know it can appear in the middle of a COMM:
> >
> > Yes, I think that we should use a given separator (using -t option)
> > instead of hard-coded semicolon. Although it'd be rare, it seems
> > possible to use semicolons in the comm name too.
>
> Well, we can have an option to specify what would be the separator for
> the callchains.

What's the problem of using a single separator?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/