loadaverage calculation broken since 1.3.7[67] [patch]

Janos Farkas (chexum@bankinf.banki.hu)
Sun, 16 Jun 1996 17:53:41 +0200 (MET DST)


To those of you who really had problems with the mentioned subject, I have
some explanation. That problem has manifested itself with the strange
and unexplainable variations in the loadaverage (like jumping some
integral levels for a short time, or maybe longer). It is caused by the
reorganization of the timer interrupt in kernel/sched.c. After that
change, the running tasks are counted for the load calculation AFTER the
timer interrupt has done its job, and has awaken some sleeping tasks.

While it may seem not really important, but this way, if you have some
tasks sleeping, they occasionally may have LARGE impact on the
loadaverage, even if they awake only to do a sub-jiffy work once in a
hour. This problem gets worse if you seek for it, like starting 20
programs, each of which usleep(200)s, in a loop. This will make a
sustained loadaverage of about 19, while no noticeable performance
degradation... :)

One should be aware of the fact that the loadaverage is a snapshot of the
running tasks once in every 12 seconds, so it could be HIGHLY unreliable
measure, but with this patch, it's gone to the level it was before 1.3.76,
which was quite acceptable for most of us. :)

Another point, that Rob has pointed out, that these occasional peaks of
pseudo-load can alter the behaviour of some programs, which is also
undesirable...

The long-term fix could be to make the count of active tasks a precise
value, which can be done by extending each and every task->state setting
by readjusting this count, which is IMHO quite workable solution for
past-2.0. This way we could have a active_tasks_count which is precise in
every tick, and this would `smooth' the effect of daemons waking up from
time to time.

And please if someone tries this patch, please report every problems, it
would be bad if it plagued the most stable 2.0.1 :)

Janos

--- linux-2.0.0/kernel/sched.c Tue May 7 11:06:51 1996
+++ linux/kernel/sched.c Sun Jun 16 01:30:20 1996
@@ -4,6 +4,9 @@
* Copyright (C) 1991, 1992 Linus Torvalds
*
* 1996-04-21 Modified by Ulrich Windl to make NTP work
+ *
+ * 1996-06-16 Janos Farkas
+ * Attempted to take care of some loadaverage glitches
*/

/*
@@ -970,18 +973,30 @@
{
unsigned long ticks, system;

+ /* Hmm.. it seems to be ugly, but if we count the tasks
+ after the timer interrupt, all the sleeping-on-timer
+ tasks may have occasional impact on the loadaverage */
+
+ /* Also.. is there a way to avoid this double cli/sti? */
+
+ cli();
+ ticks = lost_ticks;
+ lost_ticks = 0;
+ sti();
+ if (ticks)
+ calc_load(ticks);
+
run_old_timers();

cli();
run_timer_list();
- ticks = lost_ticks;
+ ticks += lost_ticks;
lost_ticks = 0;
system = lost_ticks_system;
lost_ticks_system = 0;
sti();

if (ticks) {
- calc_load(ticks);
update_wall_time(ticks);
update_process_times(ticks, system);
}