sched/cputime: sig->prev_stime underflow

From: Dave Hansen
Date: Thu Apr 04 2013 - 13:40:21 EST


With the 3.9-rcs (and probably much earlier) I'm seeing some weird top
output where the cpu time "spent" is millions of hours:

445 root 20 0 0 0 0 S 0 0.0 5124095h kworker/45:1
404 root 20 0 0 0 0 S 0 0.0 5124095h kworker/4:1

I see it mostly with kernel threads, but it doesn't seem to happen on my
distro kernel (3.5 era). The suspect code is in thread_group_times():

sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);

In my case, I caught it with rtime=34 and sig->prev_utime=35. This code
_looks_ to be pretty mature, coming in at commit 0cf55e1e in 2009. The
system I'm running on _does_ have some non-sync'd TSCs, but they are at
least being detected, so I expect the fallout to be minimal:

tsc: Marking TSC unstable due to check_tsc_sync_source failed

config:

http://sr71.net/~dave/linux/config-bigbox-04042013.txt

The dumb fix here would seem to be to just check "rtime <
sig->prev_utime". Any thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/