Re: [numbers] perfmon/pfmon overhead of 17%-94%

From: Ingo Molnar
Date: Fri Jul 03 2009 - 03:58:42 EST



* Vince Weaver <vince@xxxxxxxxxx> wrote:

> On Mon, 29 Jun 2009, Ingo Molnar wrote:
>>
>> * Vince Weaver <vince@xxxxxxxxxx> wrote:
>>
>>>> If the 5 thousand cycles measurement overhead _still_ matters to
>>>> you under such circumstances then by all means please submit the
>>>> patches to improve it. Despite your claims this is totally
>>>> fixable with the current perfcounters design, Peter outlined the
>>>> steps of how to solve it, you can utilize ptrace if you want to.
>>>
>>> Is it really "totally" fixible? I don't just mean getting the
>>> overhead from ~3000 down to ~100, I mean down to zero.
>>
>> The thing is, not even pfmon gets it down to zero:
>>
>> pfmon -e INSTRUCTIONS_RETIRED --follow-fork --aggregate-results ~/million
>> 1000001 INSTRUCTIONS_RETIRED
>>
>> So ... do you take the hardliner purist view and consider it crap
>> due to that imprecision, or do you take the pragmatist view of also
>> considering the relative relevance of any imperfection? ;-)
>
> as I said in a previous post, on most x86 chips the
> instructions_retired counter also includes any hardware interrupts
> that occur during the process runtime. So any clock interrupts,
> etc, show up as an extra instruction. So on the "million"
> benchmark, it's usually +/- 2 extra instructions.

yeah. But it has nothing to do with the function you are measuring,
right?

My general point is really that what matters is the statistical
validity of the end result. I dont think you ever disagreed with
that point - you just seem to have a lower noise acceptance
threshold ;-)

> It looks like support might be added to perfcounters to track
> these hardware interrupt stats per-process, which would be great,
> as it's been really hard to quantify that currently.

Yeah. There's a patch-set in the works that attempts to do something
in this area - see these mails on lkml:

perf_counter: Add Generalized Hardware interrupt support

Right now they are just convenience wrappers around CPU model
specific hw events - but we could extend the whole thing with
software counters as well and isolate per IRQ vector events and
counts, by adding a callback to do_IRQ().

That would give a mixture of hardware and software counter based IRQ
instrumentation features that looks quite compelling. Any comments
on what features/capabilities you'd like to see in this area?

> In any case, it looks like the changes to make perf have lower
> overhead have been merged, which makes me happy. Thank you.

You are welcome :)

Btw., perfcounters still has no support for older Intel CPUs such as
P3's and P2's - and they have pretty sane PMUs - so if you have such
a machine (which your perfmon contribution suggests you might
have/had) and are interested it would be nice to get support for
them. P4 support is interesting too but more challenging.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/