Responsiveness vs. load average (again)

Marnix Coppens (maco@telindus.be)
Mon, 16 Mar 1998 11:41:28 +0100


[Total sillyness: the first part of this mail was never sent..
Here it is again, complete this time]

I've been thinking during the weekend about how to come up with something
you could call the 'responsiveness' of the system. Put in simple terms, a
process is either using the CPU, or it's waiting for something to happen.
If it's waiting, this is usually in syscalls like wait4 or do_select. But
sometimes, it can also be waiting for the result of an I/O operation. The
first case is typical for a shell or an X client, which are sitting idle
most of the time and can thereby be called responsive. In the second case,
the process is essentially stuck: it would like to proceed but the kernel
is currently delaying it from doing so, and the user will experience this
process as less responsive.

To be more specific, my first idea was to look at the output of 'ps lax' and
examine the wchan field and somehow make a distinction between 'responsive'
syscalls and non-responsive ones. Fortunately, the scheduler is already making
this distinction for us by means of the states TASK_INTERRUPTIBLE and
TASK_UNINTERRUPTIBLE. Both ps and the contents of /proc/<pid>/status display
these states as 'S' (sleeping) and 'D' (disk).

The patch below implements this idea as an addition to the load average by
defining something like the 'busy average'. The load average counts the
processes that are running, swapping or uninterruptible. The busy average
counts only those that are uninterruptible or swapping, since these
essentially correspond to processes that are temporarily non-responsive
to the user.

<patch againt 2.0.33 follows>
--- include/linux/sched.h.orig Sun Mar 15 23:52:14 1998
+++ include/linux/sched.h Sun Mar 15 23:40:00 1998
@@ -47,6 +47,7 @@
* 11 bit fractions.
*/
extern unsigned long avenrun[]; /* Load averages */
+extern unsigned long avendel[]; /* Busy averages */

#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
--- fs/proc/array.c.orig Sun Mar 15 22:15:45 1998
+++ fs/proc/array.c Sun Mar 15 23:41:03 1998
@@ -178,16 +178,22 @@

static int get_loadavg(char * buffer)
{
- int a, b, c;
+ int a, b, c, d, e, f;

a = avenrun[0] + (FIXED_1/200);
b = avenrun[1] + (FIXED_1/200);
c = avenrun[2] + (FIXED_1/200);
- return sprintf(buffer,"%d.%02d %d.%02d %d.%02d %d/%d %d\n",
+ d = avendel[0] + (FIXED_1/200);
+ e = avendel[1] + (FIXED_1/200);
+ f = avendel[2] + (FIXED_1/200);
+ return sprintf(buffer,"%d.%02d %d.%02d %d.%02d %d/%d %d %d.%02d %d.%02d
%d.%02d\n",
LOAD_INT(a), LOAD_FRAC(a),
LOAD_INT(b), LOAD_FRAC(b),
LOAD_INT(c), LOAD_FRAC(c),
- nr_running, nr_tasks, last_pid);
+ nr_running, nr_tasks, last_pid,
+ LOAD_INT(d), LOAD_FRAC(d),
+ LOAD_INT(e), LOAD_FRAC(e),
+ LOAD_INT(f), LOAD_FRAC(f));
}

static int get_kstat(char * buffer)
--- kernel/sched.c.orig Sun Mar 15 23:55:27 1998
+++ kernel/sched.c Sun Mar 15 23:39:04 1998
@@ -940,6 +940,7 @@
* all seem to differ on different machines.
*/
unsigned long avenrun[3] = { 0,0,0 };
+unsigned long avendel[3] = { 0,0,0 };

/*
* Nr of active tasks - counted in fixed-point numbers
@@ -960,18 +961,35 @@
return nr;
}

+static unsigned long count_delayed_tasks(void)
+{
+ struct task_struct **p;
+ unsigned long nr = 0;
+
+ for(p = &LAST_TASK; p > &FIRST_TASK; --p)
+ if (*p && ((*p)->state == TASK_UNINTERRUPTIBLE ||
+ (*p)->state == TASK_SWAPPING))
+ nr += FIXED_1;
+ return nr;
+}
+
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
+ unsigned long delayed_tasks;/* fixed-point */
static int count = LOAD_FREQ;

count -= ticks;
if (count < 0) {
count += LOAD_FREQ;
active_tasks = count_active_tasks();
+ delayed_tasks = count_delayed_tasks();
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
+ CALC_LOAD(avendel[0], EXP_1, delayed_tasks);
+ CALC_LOAD(avendel[1], EXP_5, delayed_tasks);
+ CALC_LOAD(avendel[2], EXP_15, delayed_tasks);
}
}

</patch ends>

A few remarks are at place here:

1) I have used a function count_delayed_tasks() that skips the process in the
TASK_RUNNING state. You would expect (active_tasks - delayed_tasks) to be
equal (without the FIXED_1 factor) to nr_running, a global variable
maintained by the scheduler. In practice it turns out be (nr_running + 1).
The difference is due to the 'cat /proc/loadavg' process I used to measure
it (Heisenberg...), but I smell a race condition here. To be on the safe
side, I just count them separately.
2) What happens when a process needs some memory and thereby causes other
processes to get swapped out? The latter processes will obviously become
TASK_SWAPPING, but what about the first process itself? Logic dictates it
should be marked as TASK_UNINTERRUPTIBLE until the swapping is complete.
I'll have to look at the source code, I guess...
3) An alternative would be to add some extra fields to the task struct
such that every process would count the number of ticks spent in each
of the various scheduler states (and the total of its children as well, to
accommodate servers that fork a lot). This would be *really* accurate and
probably useful to the people that want to know how well their nntp or http
server is doing. Think of it as a fine-grained version of the stime field.

Cheers,

Marnix Coppens

---
Reality is that which                   | Artificial Intelligence
when you stop believing                 | stands no chance against
in it doesn't go away. (Philip K. Dick) | Natural Stupidity.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu