Re: [PATCH V3 0/6] event synthesization multithreading for perf record

From: Arnaldo Carvalho de Melo
Date: Tue Oct 24 2017 - 09:31:34 EST


Em Tue, Oct 24, 2017 at 02:59:44PM +0200, Ingo Molnar escreveu:
>
> * Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> > I recently made some changes on threaded record, which are based
> > on Namhyungs time* API, which is needed to read/sort the data afterwards
> >
> > but I wasn't able to get any substantial and constant reduce of LOST events
> > and then I got sidetracked and did not finish, but it's in here:
>
> So, in the context of system-wide profiling, the way that would work best I think
> is the following:
>
> thread #0 binds itself to CPU#0 (via sched_setaffinity) and creates a per-CPU event on CPU#0
> thread #1 binds itself to CPU#1 (via sched_setaffinity) and creates a per-CPU event on CPU#1
> thread #2 binds itself to CPU#2 (via sched_setaffinity) and creates a per-CPU event on CPU#2

Right, that is how I think it should be done as well, and those will
just dump on separate files, in a per session directory, with an extra
file for the session details, in what is now the header.

Later, the same thing happens at processing time, this time we'll have
contention to access global thread state, the need for rounds of
PERF_SAMPLE_TIME based ordering, like what we have now in the
tools/perf/util/ordered-events.[ch] code, etc.

This works for both 'report', 'script', 'top', 'trace', etc, as is
basically the model we already have. All the work that was done for
refcounting the thread, map, etc as well as locking those rbtrees would
finally be taken full advantage of.

- Arnaldo

> etc.
>
> Is this how you implemented it?

> If the threads in the thread pool are just free-running then the scheduler might
> not migrate it to the 'right' CPU that is streaming the perf events and there will
> be a lot of cross-talking between CPUs.
>
> Inherited events (default 'perf record') is tougher.
>
> Thanks,
>
> Ingo