Re: [PATCH] perf inject: Flush ordered events on FINISHED_ROUND

From: namhyung
Date: Tue Oct 06 2020 - 01:40:44 EST


On Tue, Oct 06, 2020 at 11:39:49AM +0900, namhyung@xxxxxxxxxx wrote:
> > > On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote:
> > > > Below measures time and memory usage during the perf inject and
> > > > report using ~190MB data file.
> > > >
> > > > Before:
> > > > perf inject: 11.09 s, 382148 KB
> > > > perf report: 8.05 s, 397440 KB
> > > >
> > > > After:
> > > > perf inject: 16.24 s, 83376 KB
> > > > perf report: 7.96 s, 216184 KB
> > > >
> > > > As you can see, it used 2x memory of the input size. I guess it's
> > > > because it needs to keep the copy for the whole input. But I don't
> > > > understand why processing time of perf inject increased..
>
> Measuring it with time shows:
>
> before after
> real 11.309s 17.040s
> user 8.084s 13.940s
> sys 6.535s 6.732s
>
> So it's user space to make the difference. I've run perf record on
> both (with cycles:U) and the dominant function is same: queue_event.
> (46.98% vs 65.87%)
>
> It seems the flushing the queue makes more overhead on sorting.

So I suspect the cache-miss ratio affects the performance. With
flushing, data is processed in the middle and all the entries are
reused after flush so it would invalidate all the cache lines
occasionally.

This is the perf stat result:

* Before

7,167,414,019 L1-dcache-loads
337,471,761 L1-dcache-read-misses # 4.71% of all L1-dcache hits

11.011224671 seconds time elapsed


* After

7,075,556,792 L1-dcache-loads
771,810,388 L1-dcache-read-misses # 10.91% of all L1-dcache hits

17.015901863 seconds time elapsed


Hmm.. it's a memory & time trade-off then. Maybe we need a switch to
select which one?

Thanks
Namhyung