Re: PERF_EVENT_IOC_SET_OUTPUT

From: Peter Zijlstra
Date: Wed Oct 02 2013 - 07:27:46 EST


On Wed, Oct 02, 2013 at 12:29:56PM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 02, 2013 at 12:03:50PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> > > Hi
> > >
> > > It does not seem possible to use set-output between
> > > task contexts of different types (e.g. a software event
> > > to a hardware event)
> > >
> > > If you look at perf_event_set_output():
> > >
> > > /*
> > > * If its not a per-cpu rb, it must be the same task.
> > > */
> > > if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> > > goto out;
> > >
> > > ctx (perf_event_context) won't be the same for events
> > > of different types. Is this restriction necessary?
> >
> > Hmm.. so last night I wrote me a big reply saying we couldn't do it;
> > then this morning I reconsidered and thing that something like:
> >
> > output_event->ctx->task != event->ctx->task
> >
> > should actually work.
> >
> > The reason it should be OK I think is because perf_mmap() will refuse to
> > create a buffer for inherited events that have ->cpu == -1.
> >
> > My initial response was going to say that it wouldn't be possible
> > because __perf_event_task_sched_out() could 'break' one ctx while still
> > swapping the other, at which point the buffer would have to service two
> > different tasks, potentially from different CPUs and with the buffers
> > not actually being SMP safe that's a problem.
>
> I don't get what you mean with breaking or swapping a ctx.
> But I can confirm that perf_mmap() won't allow a buffer to be remotely
> accessed from another CPU. Now there may be other issues than locality which
> I'm missing :)

The way we 'optimize' context switches between tasks with identical
contexts is to simply swap the context and leave the hardware alone.

So counters belonging to prev will then belong to next and vice versa.
This avoids having to read hardware counters, update stats, removes
counters from hardware, and re-program hardware with possible the exact
same set.

When a child context changes its context (eg, inserts or removes a
counter) we break this swapping because now the contexts don't match
anymore and we have to take the slow and painful way of prodding
hardware.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/