Re: [PATCH v3 3/4] perf stat: Copy counts from prev_raw_counts to evsel->counts

From: Jin, Yao
Date: Thu May 07 2020 - 23:34:23 EST


Hi Jiri,

On 5/7/2020 11:19 PM, Jiri Olsa wrote:
On Thu, May 07, 2020 at 02:58:21PM +0800, Jin Yao wrote:
It would be useful to support the overall statistics for perf-stat
interval mode. For example, report the summary at the end of
"perf-stat -I" output.

But since perf-stat can support many aggregation modes, such as
--per-thread, --per-socket, -M and etc, we need a solution which
doesn't bring much complexity.

The idea is to use 'evsel->prev_raw_counts' which is updated in
each interval and it's saved with the latest counts. Before reporting
the summary, we copy the counts from evsel->prev_raw_counts to
evsel->counts, and next we just follow non-interval processing.

I did not realize we already store the count in prev_raw_counts ;-) nice catch!


Thanks! :)


In evsel__compute_deltas, this patch saves counts to the position
of [cpu0,thread0] for AGGR_GLOBAL. After copying counts from
evsel->prev_raw_counts to evsel->counts, we don't need to
modify process_counter_maps in perf_stat_process_counter to let it
work well.

I don't understand why you need to store it in here.. what's the catch
in process_counter_maps?


Sorry, I didn't explain clearly.

You know the idea is to copy evsel->prev_raw_counts to evsel->counts before reporting the summary.

But for AGGR_GLOBAL (cpu = -1 in perf_evsel__compute_deltas), the evsel->prev_raw_counts is only stored with the aggr value.

if (cpu == -1) {
tmp = evsel->prev_raw_counts->aggr;
evsel->prev_raw_counts->aggr = *count;
} else {
tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
}

So after copying evsel->prev_raw_counts to evsel->counts, perf_counts(evsel->counts, cpu, thread) are all 0.

Once we go to process_counter_maps again, in process_counter_values, count->val is 0.

case AGGR_GLOBAL:
aggr->val += count->val;
aggr->ena += count->ena;
aggr->run += count->run;

And the aggr->val is 0.

So this patch uses a trick that saves the previous aggr value to cpu0/thread0, then above aggr->val calculation can work correctly.

Thanks
Jin Yao

thanks,
jirka