Re: perf, x86: Add parts of the remaining haswell PMU functionality

From: Andi Kleen
Date: Thu Sep 05 2013 - 15:33:22 EST


> Well, at least the front-end side is still documented in the SDM as being
> usable to count stalled cycles.

Stalled frontend cycles does not necessarily mean frontend bound.
The real bottleneck can be still somewhere later in the PipeLine.
Out of Order CPUs are complex.

>
> AFAICS backend stall cycles are documented to work on Ivy Bridge.

I'm not aware of any documentation that presents these events
as accurate frontend/backend stalls without using the full
TopDown methology (Optimization manual B.3.2)

The level 1 top down method for IvyBridge and Haswell is:

PipelineWidth = 4
Slots = PipelineWidth*CPU_CLK_UNHALTED
FrontendBound = IDQ_UOPS_NOT_DELIVERED.CORE / Slots
BadSpeculation = (UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS +
Width*INT_MISC.RECOVERY_CYCLES) / Slots
Retiring = UOPS_RETIRED.RETIRE_SLOTS / Slots
BackendBound = FrontendBound - BadSpeculation + Retiring

> For perf stat -a alike system-wide workloads it should still produce
> usable results that way.

For some classes of workloads it will be a large unpredictable
systematic error.

> I.e. something like the patch below (it does not solve the double counting
> yet).

Well you can add it, but I'm not going to Ack it.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/