Re: [GIT PULL] cputime patch for 2.6.30-rc6

From: Martin Schwidefsky
Date: Tue May 19 2009 - 05:01:01 EST


On Mon, 18 May 2009 17:28:53 +0100 (BST)
Michael Abbott <michael@xxxxxxxxxxxxxxx> wrote:

> > > + for_each_possible_cpu(i)
> > > + idletime = cputime64_add(idletime, kstat_cpu(i).cpustat.idle);
> > > + idletime = cputime64_to_clock_t(idletime);
> > >
> > > do_posix_clock_monotonic_gettime(&uptime);
> > > monotonic_to_bootbased(&uptime);
> >
> > This is a world readable proc file, adding a for_each_possible_cpu() in
> > there scares me a little (this wouldn't be the first and only such case
> > though).
> >
> > Suppose you have lots of cpus, and all those cpus are dirtying those
> > cachelines (who's updating idle time when they're idle?), then this loop
> > can cause a massive cacheline bounce fest.
> >
> > Then think about userspace doing:
> > while :; do cat /proc/uptime > /dev/null; done
>
> Well, the offending code derives pretty well directly from /proc/stat,
> which is used, for example, by top. So if there is an issue then I guess
> it already exists.
>
> There is a pending problem in this code: for a multiple cpu system we'll
> end up with more idle time than elapsed time, which is not really very
> nice. Unfortunately *something* has to be done here, as it looks as if
> .utime and .stime (at least for init_task) have lost any meaning. I sort
> of though of dividing by number of cpus, but that's not going to work very
> well..

I don't see a problem here. In an idle multiple cpu system there IS
more idle time than elapsed time. What would makes sense is to compare
elapsed time * #cpus with the idle time. But then there is cpu hotplug
which forces you to look at the delta of two measuring points where the
number of cpus did not change.

> I came to this problem from a uni-processor instrument which uses
> /proc/uptime to determine whether the system is overloaded (and discovers
> on the current kernel that it is, permanently!). This fix is definitely
> imperfect, but I think a better fix will require rather deeper knowledge
> of kernel time accounting than I can offer.

Hmm, I would use the idle time field from /proc/stat for that.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/