Re: [patch 2/2] sched: fix nr_uninterruptible accounting of frozentasks really

From: Matt Helsley
Date: Fri Jul 17 2009 - 12:16:50 EST


On Fri, Jul 17, 2009 at 02:31:50PM +0200, Peter Zijlstra wrote:
> On Fri, 2009-07-17 at 12:25 +0000, Thomas Gleixner wrote:
> > plain text document attachment (freezer-fix-accounting-for-real.patch)
> > commit e3c8ca8336 (sched: do not count frozen tasks toward load) broke
> > the nr_uninterruptible accounting on freeze/thaw. On freeze the task
> > is excluded from accounting with a check for (task->flags &
> > PF_FROZEN), but that flag is cleared before the task is thawed. So
> > while we prevent that the freezing task with state
> > TASK_UNINTERRUPTIBLE is accounted to nr_uninterruptible we decrement
> > nr_uninterruptible on thaw.
> >
> > Use a separate flag which is handled by the freezing task itself. Set
> > it before calling the scheduler with TASK_UNINTERRUPTIBLE state and
> > clear it after we return from frozen state.
>
> Right, so I'm wondering why we don't fully revert e3c8ca8336 to begin
> with.
>
> The changelog reads:
>
> ---
> commit e3c8ca8336707062f3f7cb1cd7e6b3c753baccdd
> Author: Nathan Lynch <ntl@xxxxxxxxx>
> Date: Wed Apr 8 19:45:12 2009 -0500
>
> sched: do not count frozen tasks toward load
>
> Freezing tasks via the cgroup freezer causes the load average to climb
> because the freezer's current implementation puts frozen tasks in
> uninterruptible sleep (D state).
>
> Some applications which perform job-scheduling functions consult the
> load average when making decisions. If a cgroup is frozen, the load
> average does not provide a useful measure of the system's utilization
> to such applications. This is especially inconvenient if the job
> scheduler employs the cgroup freezer as a mechanism for preempting low
> priority jobs. Contrast this with using SIGSTOP for the same purpose:
> the stopped tasks do not count toward system load.
>
> Change task_contributes_to_load() to return false if the task is
> frozen. This results in /proc/loadavg behavior that better meets
> users' expectations.
> ---
>
> It appears to me that a frozen cgroup is a transient state. Either you
> would typically do something like:
>
> freeze -> {snapshot, migrate} -> {thaw, destroy}
>
> Therefore a short increase in load doesn't seem like too big a problem,
> its going to be gone soon anyway.
>
> Hmm?

The job scheduler in question does not use FROZEN as a transient state and
does not use checkpoint/restart at all since c/r is still a work in progress.
Even when used for power management it seems wrong to count frozen tasks
towards the loadavg since they aren't using CPU time or waiting for IO.

Cheers,
-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/