Re: [RFC][PATCH 09/10] taskstats: Fix exit CPU time accounting

From: Martin Schwidefsky
Date: Mon Sep 27 2010 - 09:43:11 EST


On Sun, 26 Sep 2010 20:11:27 +0200
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> Hi,
>
> On 09/24, Michael Holzheu wrote:
> >
> > On Thu, 2010-09-23 at 19:10 +0200, Oleg Nesterov wrote:
> > >
> > > On 09/23, Michael Holzheu wrote:
> > > >
> > > > Currently there are code pathes (e.g. for kthreads) where the consumed
> > > > CPU time is not accounted to the parents cumulative counters.
> > >
> > > Could you explain more?
> >
> > I think one place was "khelper" (kmod.c). It is created with
> > kernel_thread() and it exits without having accounted the times with
> > sys_wait() to the parent's ctimes
>
> No. Well yes, it is not accounted, but this is not because it is
> kthread.

We noticed that behavior with kernel threads but as you point out
the problem is bigger than that.

> To simplify the discussion, lets talk about utime/cutime only,
> and lets forget about the multithreading.
>
> It is very simple, currently linux accounts the exiting task's
> utime and adds its to ->cutime _only_ if parent does do_wait().
> If parent ignores SIGCHLD, the child reaps itself and it is not
> accounted.
>
> I do not know why it was done this way, but I'm afraid we can't
> change this historical behaviour.

Why? I would consider it to be a BUG() that the time is not accounted.
Independent of the fact that a parent wants to see the SIGCHLD and
the exit status of its child the process time of the child should be
accounted, no? And I'm not a particular fan of the "this has always
been that way" reasoning.

> > Ok, the problem is that I did not consider exiting threads that are no
> > thread group leaders. When they exit the ctime of the parent is not
> > updated. Instead the time is accumulated in the signal struct.
>
> I think I am a bit confused, but see above. With or without threads
> the whole process can exit without accounting.

Got the part about self-reaping processes. But there is another issue:
consider an exiting thread where the group leader is still active.
The time for the thread will be added to the utime/stime fields in
the signal structure. Taskstats will happily ignore that time while
the group leader is still running.

Please keep in mind that we want to get to a point where it is
possible to get a 100% coverage of cpu cycles in the last snapshot
cycle through the taskstats interface. Otherwise the precise top
would not be very precise ..

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/