Re: [PATCH] perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES

From: Arun Sharma
Date: Wed Apr 27 2011 - 15:03:17 EST


On Wed, Apr 27, 2011 at 8:48 AM, Ingo Molnar <mingo@xxxxxxx> wrote:

>
>> The other issue I had to deal with was UOPS_RETIRED > UOPS_EXECUTED
>> condition. I believe this is caused by what AMD calls sideband stack
>> optimizer and Intel calls dedicated stack manager (i.e. UOPS executed outside
>> the main pipeline). A recursive fibonacci(30) is a good test case for
>> reproducing this.
>
> So the PORT015+234 sum is not precise? The definition seems to be rather firm:
>
> ÂCounts number of Uops executed that where issued on port 2, 3, or 4.
> ÂCounts number of Uops executed that where issued on port 0, 1, or 5.
>

There is some work done outside of the main out of order engine for
power optimization reasons:

Described as dedicated stack engine here:
http://www.intel.com/technology/itj/2003/volume07issue02/art03_pentiumm/vol7iss2_art03.pdf

However, I can't seem to be able to reproduce this behavior using a
micro benchmark right now:

# cat foo.s
.text
.global main
main:
1:
push %rax
push %rbx
push %rcx
push %rdx
pop %rax
pop %rbx
pop %rcx
pop %rdx
jmp 1b

Performance counter stats for './foo':

7,755,881,073 UOPS_ISSUED:ANY:t=1 (scaled from 79.98%)
10,569,957,988 UOPS_RETIRED:ANY:t=1 (scaled from 79.96%)
9,155,400,383 UOPS_EXECUTED:PORT234_CORE (scaled from 80.02%)
2,594,206,312 UOPS_EXECUTED:PORT015:t=1 (scaled from 80.02%)

Perhaps I was thinking of UOPS_ISSUED < UOPS_RETIRED.

In general, UOPS_RETIRED (or instruction retirement in general) is the
"source of truth" in an otherwise crazy world and might be more
interesting as a generalized event that works on multiple
architectures.

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/