Re: [RFC] perf_events: support for uncore a.k.a. nest units

From: Corey Ashford
Date: Wed Mar 31 2010 - 13:51:11 EST


On 03/31/2010 07:01 AM, Peter Zijlstra wrote:
On Tue, 2010-03-30 at 15:12 -0700, Corey Ashford wrote:

Initially I'd not allow per-pmu-per-task contexts
because then things like perf_event_task_sched_out() would get rather
complex.

Definitely. I don't think it makes sense to have per-task context on
nest/uncore PMUs. At least we haven't found any justification for it.

For uncore no, but there is also the hw-breakpoint stuff that is being
presented as a pmu, for those it would make sense to have a separate
per-task context.

But doing multiple per-task contexts is something for a next step
indeed.

For RR we can move away from perf_event_task_tick and let the pmu
install a (hr)timer for this on their own.

This is necessary I think, because of the access time for some of the PMU's. I
wonder though if it should, perhaps optionally, be off-loaded to a high-priority
task to do the switching so that access latency to the PMU can be controlled.

As I mentioned when we met, some of the Wire-Speed processor nest PMU control
registers are accessed via SCOM, which is an internal, 200 MHz serial bus. We
are being quoted ~525 SCOM bus ticks to do a PMU control register access, which
comes out to about 2.5 microseconds. If you figure 5 accesses to rotate the
events on a PMU, that's a minimum of 12.5 microseconds.

Yeah, you mentioned that.. for those things we need some changes anyway,
since currently we install per-cpu counters using IPIs and expect the
pmu::enable() method to be synchronous (it has a return value).

That's a good point. We hadn't considered this issue.

It would
be totally unacceptable to do 2.5ms pokes with IRQs disabled.

Just to be clear, it's 2.5us, not 2.5ms, but I think it's still bad... in our case, it's about 6600 processor clocks per access.

The RR thing would be the easiest to solve, just let the timer wake up a
thread instead of doing the work itself, that's fully isolated to how
the pmu chooses to implement that. The above mentioned issue however
would be much more challenging to fix nicely.

It seems like it might need to be done in two phases. IPI request is sent, and then a thread is woken up on the other CPU, it does some work, and then sets a status variable and somehow notifies the caller that the operation has completed. I don't know the kernel's communication mechanisms well enough to know which one is most appropriate - maybe rwsem?

- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/