Re: [PATCH 4/9] perf tools: Introduce hists__inc_dump_events()

From: Namhyung Kim
Date: Wed Apr 23 2014 - 01:59:31 EST


On Tue, 22 Apr 2014 18:53:55 +0200, Jiri Olsa wrote:
> On Tue, Apr 22, 2014 at 05:49:46PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>> index f955ae5a41c5..883340d7d43e 100644
>> --- a/tools/perf/util/hist.c
>> +++ b/tools/perf/util/hist.c
>> @@ -333,6 +333,19 @@ void hists__inc_nr_events(struct hists *hists, u32 type)
>> __events_stats__add(&hists->stats, type, 1);
>> }
>>
>> +void hists__inc_dump_events(struct hists *hists)
>> +{
>> + if (!dump_trace)
>> + return;
>> +
>> + /*
>> + * If dump_trace is enabled, perf will exit before accounting
>> + * sample events during hists__output_resort(). Thus it needs to
>> + * be done separately.
>> + */
>> + __events_stats__add(&hists->stats, PERF_RECORD_SAMPLE, 1);
>> +}
>
> hum, we already clear all the stats before resorting for output so why done
> we call hists__inc_nr_entries from add_hist_entry? (at out label)

In short, because it could be seen by perf top's display thread before
the entry actually moves in the output tree.

Let me explain what this series does again.

There're three main fields in the hists->stats to affect the output:
number of samples, number of hist entries and total periods.

Also there're three stages to process samples: at first, samples are
converted to a hist entry and added to the input tree, and then they are
moved to the collapsed tree if needed, and finally they're moved to the
output tree to be shown to user.

The (part of) stats are accounted when samples are added to the input
tree and then reset before moving to the output tree, and re-counted
during insertion to the output tree.

I can see some reason to do it this way but it's basically not necessary
and could make a problem in multi-threaded programs like perf top.

The perf report does all these passes sequentially in a single thread so
it seems no problem. But perf top uses two threads - one for gathering
samples (in the input tree) and another for (collapsing and) moving them
to the output tree. Thus accounting stat in parallel can result in an
inaccurate stats and the output.

So I'd like to get rid of the accounting on the input stage as you can
see it just gets dropped before doing output resort. I originally make
the all three stats are accounted when doing output resort but changed
mind to account number of samples in the input stage and others in the
output stage. Because it'd make more sense accounting number of events
(sample event) in the input stage (as all other events are also
accounted in the input stage) and it'd make less changes in code.

So yes, it has a same problem of inaccurate number of samples, but its
impact should be smaller than other stats - seeing increasing sample
count (could be slightly inaccurate) without new entries in the browser.

Thoughts?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/