Re: [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86

From: Pavel Machek
Date: Sun Jun 01 2008 - 10:46:24 EST


Hi!

> > > The following RFC patch tries to implement scaled CPU utilisation
> > > statistics using APERF and MPERF MSR registers in an x86 platform.
> > >
> > > The CPU capacity is significantly changed when the CPU's frequency is
> > > reduced for the purpose of power savings. The applications that run
> > > at such lower CPU frequencies are also accounted for real CPU time by
> > > default. If the applications have been run at full CPU frequency,
> > > they would have finished the work faster and not get charged for
> > > excessive CPU time.
> > >
> > > One of the solution to this problem it so scale the utime and stime
> > > entitlement for the process as per the current CPU frequency. This
> > > technique is used in powerpc architecture with the help of hardware
> > > registers that accurately capture the entitlement.
> > >
> >
> > there are some issues with this unfortunately, and these make it
> > a very complex thing to do.
> > Just to mention a few:
> > 1) What if the BIOS no longer allows us to go to the max frequency for
> > a period (for example as a result of overheating); with the approach
> > above, the admin would THINK he can go faster, but he cannot in reality,
> > so there's misleading information (the system looks half busy, while in
> > reality it's actually the opposite, it's overloaded). Management tools
> > will take the wrong decisions (such as moving MORE work to the box, not
> > less)
> > 2) On systems with Intel Dynamic Acceleration technology, you can get
> > over 100% of cycles this way. (For those who don't know what IDA is;
> > IDA is basically a case where if your Penryn based dual core laptop is
> > only using 1 core, the other core can go faster than 100% as long as
> > thermals etc allow it). How do you want to deal with this?
>
> Hi Arjan,
>
> Thanks you for the inputs. The above issues are very valid and our
> solution should be able to react appropriately to the above situation.
>
> What we are proposing is a scaled time value that is scaled to the
> current CPU capacity. If the scaled utilisation is 50% when the CPU
> is at 100% capacity, it is expected to remain at 50% even if the CPU's
> capacity is dropped to 50%, while the traditional utilisation value
> will be 100%.

time one-second-busy-loop should return close to one second. That's
current behaviour. You don't like it, but it is useful.

If you change it, that's called 'regression'.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/