Re: [RFC][PATCH 00/11] perf pmu interface -v2

From: Corey Ashford
Date: Wed Jun 30 2010 - 13:19:41 EST


On 06/28/2010 08:13 AM, Peter Zijlstra wrote:
On Sat, 2010-06-26 at 09:22 -0700, Corey Ashford wrote:
As for the "hardware write batching", can you describe a bit more about
what you have in mind there? I wonder if this might have something to
do with accounting for PMU hardware which is slow to access, for
example, via I2C via an internal bridge.

Right, so the write batching is basically delaying writing out the PMU
state to hardware until pmu::pmu_enable() time. It avoids having to
re-program the hardware when, due to a scheduling constraint, we have to
move counters around.

So say, we context switch a task, and remove the old events and add the
new ones under a single pmu::pmu_disable()/::pmu_enable() pair, we will
only hit the hardware twice (once to disable, once to enable), instead
of for each individual ::del()/::add().

For this to work we need to have an association between a context and a
pmu, otherwise its very hard to know what pmu to disable/enable; the
alternative is all of them which isn't very attractive.

Then again, it doesn't make sense to have task-counters on an I2C
attached PMU simply because writing to the PMU could cause context
switches.

Thanks for your reply.

In our case, writing to some of the nest PMUs' control registers is done via an internal bridge. We write to a specific memory address and an internal MMIO-to-SCOM bridge (SCOM is similar to I2C) translates it to serial and sends it over the internal serial bus. The address we write to is based upon the control register's serial bus address, plus an offset from the base of MMIO-to-SCOM bridge. The same process works for reads.

While it does not cause a context switch because there are no IO drivers to go through, it will take several thousand CPU cycles to complete, which by the same token, still makes them inappropriate for task-counters (not to mention that the nest units operate asynchronously from the CPU).

However, there still are concerns relative to writing these control registers from an interrupt handler because of the latency that will be incurred, however slow we choose to do the event rotation. So at least for the Wire-Speed processor, we may need a worker thread of some sort to hand off the work to.

Our current code, based on linux 2.6.31 (soon to be 2.6.32) doesn't use worker threads; we are just taking the latency hit for now.

- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/