Re: [PATCH 14/18] 2.6.17.9 perfmon2 patch for review: new i386 files

From: Stephane Eranian
Date: Fri Aug 25 2006 - 11:11:57 EST


Andi,

On Fri, Aug 25, 2006 at 04:53:52PM +0200, Andi Kleen wrote:
> On Friday 25 August 2006 16:27, Stephane Eranian wrote:
>
> > > BTW you might be able to simplify some of your code by exploiting
> > > those. i386 currently doesn't have them, but i wouldn't see a problem
> > > with adding them there too.
> > >
> > I think I will drop the EXCL_IDLE feature given that most PMU stop
> > counting when you go low-power. The feature does not quite do what
> > we want because it totally exclude the idle from monitoring, yet
> > the idle may be doing useful kernel work, such as fielding interrupts.
>
> Ok fine. Anything that makes the code less complex is good.
> Currently it is very big and hard to understand.
>
> (actually at least one newer Intel system I saw seemed to continue counting
> in idle, but that might have been a specific quirk)
>

Yes, that's my fear, we may get inconsistent behaviors across architectures.
I think the only way to ensure some consistency would be to use the
enter/exit_idle callbacks you mentioned assuming those would be available for
all architectures. With this, we could guarantee that we are not monitoring
usless execution (including low-power mode) simply because we would explicitely
stop monitoring on enter_idle() and restart monitoring on exit_idle().

> > For NMI, you want the counter to overflow at a certain frequency:
> >
> > wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz * 1000 / nmi_hz));
> >
> > But for RDTSC, I would think you'd simply want the counter to count
> > monotonically. Given that perfctr0 is not 64-bit but 40, it will also
> > overflow (or wraparound) but presumably at a lower frequency than the
> > watchdog timer. I think I am not so clear on the intended usage user
> > level usage of perfctr0 as a substitute for RDTSC.
>
> Yes we need to underflow. But the users have to live with that.
> I can make it longer than before though, but the period will be
> <10s or so.

So the goal of this is for a more realiable way of measuring short
sections of code, isn't it? If I recall, the TSC does not quite work
with frequency scaling.

Is anybody lobbying the HW designers to implement another register to
do what you need here? That would certainly simplify things.

> Two counters would be too much I think.
>
Certainly given that there are other users of that resource and
that on K8 you only have 4.

--
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/