Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usagemetering applications has gone crackers

From: Michal Hocko
Date: Mon Nov 28 2011 - 16:43:26 EST


On Mon 28-11-11 22:41:25, Michal Hocko wrote:
> Hi,
>
> On Mon 28-11-11 21:19:26, Rafael J. Wysocki wrote:
> > On Monday, November 28, 2011, Tino Keitel wrote:
> > > On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > > > under this kernel:
> > > > > >
> > > > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > > >
> > > > > I just tested 3.2-rc2, and see the same bug.
> > > >
> > > > I'm seeing that too on one of my test boxes, but not all the time
> > > > (i.e. there are periods in which the readings are correct). The other boxes
> > > > I've tested with 3.2-rc are fine in that respect.
> > > >
> > > > Also, it seems that it shows 100%-(real load) when it is wrong. So, it looks
> > > > like there's an overflow somewhere in the CPU load measuring code, at least
> > > > on some CPUs.
> > >
> > > Hi,
> > >
> > > I reverted this commit and so far it looks good:
> > >
> > > commit a25cac5198d4ff2842ccca63b423962848ad24b2
> > > Author: Michal Hocko <mhocko@xxxxxxx>
> > > Date: Wed Aug 24 09:40:25 2011 +0200
> > >
> > > proc: Consider NO_HZ when printing idle and iowait times
> > >
> > > I'll report back tomorrow how the kernel behaves.
> >
> > Hmm. Michal, can you have a look at that, please?
>
> Hmm, my testing didn't show anything like that. Could you post
> cat /proc/stat collected every second during 30s or so?
>
> Here is the output of my run with 3.2.0-rc3-00004-gdd38d29 and the attached config:
> for i in `seq 30`;
> do
> cat /proc/stat > `date +'%s'`
> sleep 1
> done
> export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0;
> grep cpu0 * | while read cpu user nice sys idle iowait rest;
> do
> echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
> old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
> done
>
> Mostly no workload (idle desktop) - few seconds of bosy loop:
> 1322516060:cpu0 621150 1978 148367 299773 196163
> 1322516061:cpu0 4 0 3 92 0
> 1322516062:cpu0 16 0 9 79 0
> 1322516063:cpu0 0 0 0 97 0
[...]

Forgot to add, but cpu1 looks similar
1322516060:cpu1 641344 832 137307 132871 44144
1322516061:cpu1 4 0 4 92 0
1322516062:cpu1 19 0 11 74 0
1322516063:cpu1 2 0 2 96 0
1322516064:cpu1 7 0 4 89 0
1322516065:cpu1 0 0 0 97 0
1322516066:cpu1 2 0 3 88 6
1322516067:cpu1 59 0 1 40 0
1322516068:cpu1 101 0 0 0 0
1322516069:cpu1 100 0 0 0 0
1322516070:cpu1 1 0 1 96 0
1322516071:cpu1 1 0 3 90 7
1322516072:cpu1 2 0 0 97 0
1322516073:cpu1 1 0 1 98 0
1322516074:cpu1 1 0 3 97 0
1322516075:cpu1 0 0 0 98 0
1322516076:cpu1 2 0 1 98 0
1322516077:cpu1 1 0 2 98 0
1322516078:cpu1 0 0 1 99 0
1322516079:cpu1 1 0 2 99 0
1322516080:cpu1 0 0 1 98 0
1322516081:cpu1 1 0 1 98 0
1322516082:cpu1 1 0 2 98 0
1322516083:cpu1 0 0 1 99 0
1322516084:cpu1 2 0 1 98 0
1322516085:cpu1 1 0 2 97 0
1322516086:cpu1 1 0 0 99 0
1322516087:cpu1 0 0 2 97 0
1322516088:cpu1 2 0 2 98 0
1322516089:cpu1 1 0 1 97 0


--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/