Re: 2.6: Load average calculation?

From: Valerie Henson
Date: Tue Mar 28 2006 - 06:05:00 EST


On Tue, Mar 28, 2006 at 11:56:12AM +0100, Russell King wrote:
>
> So, the question becomes - should a lot of network activity contribute
> to the system load average, thereby denying other services from
> performing their usual business.

Another case where simply counting up all processes in D state results
in an unreasonable load average is the "NFS server stops responding"
case. Even though all threads doing I/O to the NFS server are totally
inactive until the server comes back, they are all stuck in D state -
and counting towards the load average.

What these cases have in common is interesting: in both cases, the
thread is throttled by an external machine. We're not waiting on I/O
that is taking up resources locally and therefore should be counted as
part of load average; we're waiting for some other machine to free up
enough resources that we can push some data down the pipe.

The comment for io_schedule() suggests that this case has received
some thought:

/*
* This task is about to go to sleep on IO. Increment rq->nr_iowait so
* that process accounting knows that this is a task in IO wait state.
*
* But don't do that if it is a deliberate, throttling IO wait (this task
* has set its backing_dev_info: the queue against which it should throttle)
*/
void __sched io_schedule(void)
{
struct runqueue *rq = &per_cpu(runqueues, raw_smp_processor_id());

atomic_inc(&rq->nr_iowait);
schedule();
atomic_dec(&rq->nr_iowait);
}

The code and comment are out of sync and in any case don't help us
here.

Possible solution: Maybe sync_page should take into account whether
this is an NFS file or TCP sendfile page and call schedule() instead of
io_schedule() in these cases?

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/