Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

From: tmhikaru
Date: Tue Nov 30 2010 - 15:01:23 EST


On Tue, Nov 30, 2010 at 12:01:17AM +0100, Peter Zijlstra wrote:
> On Mon, 2010-11-29 at 14:40 -0500, tmhikaru@xxxxxxxxx wrote:
> > On Mon, Nov 29, 2010 at 12:38:46PM +0100, Peter Zijlstra wrote:
> > > On Sun, 2010-11-28 at 12:40 +0100, Damien Wyart wrote:
> > > > Hi,
> > > >
> > > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2010-11-27 21:15]:
> > > > > How does this work for you? Its hideous but lets start simple.
> > > > > [...]
> > > >
> > > > Doesn't give wrong numbers like initial bug and tentative patches, but
> > > > feels a bit too slow when numbers go up and down. Correct values are
> > > > reached when waiting long enough, but it feels slow.
> > > >
> > > > As I've tested many combinations, maybe this is an impression because
> > > > I do not remember about "normal" delays for the load to rise and fall,
> > > > but this still feels slow.
> > >
> > > You can test this by either booting with nohz=off, or builting with
> > > CONFIG_NO_HZ=n and then comparing the result, something like
> > >
> > > make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
> > > make -j32 O=defconfig; kill %1
> > >
> > > And comparing the curves between the NO_HZ and !NO_HZ kernels.
> > >
> > > I'll try and make the patch less hideous ;-)
> >
> > I've tested this patch on my own use case, and it seems to work for the most
> > part - it's still not settling as low as the previous implementation used
> > to, nor is it settling as low as CONFIG_NO_HZ=N (that is to say, 0.00 across
> > the board when not being used) however, this is definitely an improvement:
> >
> > 14:26:04 up 9:08, 5 users, load average: 0.05, 0.01, 0.00
> >
> > This is the result of running uptime on a checked out version of
> > [74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes
> >
> > with the patch applied, starting X, and simply letting the machine sit idle
> > for nine hours. For the brief period I spent watching it after boot, it
> > quickly began settling down to a reasonable value, I only let it sit idle
> > this long to verify the loadavg was consistently low. (the loadavg was
> > consistently erratic, anywhere from 0.6 to 1.2 with the machine idle without
> > this patch)
>
> Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
> how about this one?
>
> (it seems to drop down to 0.00 if I wait a few minutes with top -d5)


I haven't had time to test your further patches but THIS works!

14:57:03 up 14:01, 4 users, load average: 0.00, 0.00, 0.00

Load seems to finally be accurate on my machine compared to processes
running/whatever else usage. This is again testing vs the original commit
that caused the problems for me:

[74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes

so I know I'm testing apples to apples here.


As time permits I'll test the later replies you made to yourself.

Thank you,
Tim McGrath

Attachment: pgp00000.pgp
Description: PGP signature