Re: UP flu

Philipp Rumpf (prumpf@jcsbs.lanobis.de)
Wed, 18 Nov 1998 18:25:20 +0100


On Tue, Nov 17, 1998 at 01:13:32AM -0800, George Bonser wrote:
> On Mon, 16 Nov 1998, Jeremy Katz wrote:
>
> > Okay, more info :) It seems to only happen (I think) when I am playing a
> > sound...
>
> It dies here and I do not have sound configured at all. It just seems that
> the darned thing can not get to disk. Running processes seem to be OK, two
> tops on different VC's update fine, they are getting scheduled. It seems
> to be when something needs to do disk I/O to me. I run cnews here and it
> seems that batching will sometimes trigger it. It takes me too darned long
> to make kernels on this machine, I will test on a 200MHz 6x86MX tomarrow
> with even less RAM (32MB). Seems to me that big memory machines don't seem
> to have the problem as much as small RAM machines.

I have got a different theory:
Let's assume CONFIG_APM is defined (we'll handle the other case below).
Now have a look at arch/i386/kernel/process.c
This is code only compiled #ifndef __SMP__, which would explain that the bug
occurs only with UP kernels.

static void hard_idle(void)
{
while (!current->need_resched) {
if (boot_cpu_data.hlt_works_ok && !hlt_counter) {
/* If the APM BIOS is not enabled, or there
is an error calling the idle routine, we
should hlt if possible. We need to check
need_resched again because an interrupt
may have occurred in apm_do_idle(). */
start_bh_atomic();
if (!apm_do_idle() && !current->need_resched)
__asm__("hlt");
end_bh_atomic();
}
if (current->need_resched)
break;
schedule();
}
apm_do_busy();
}

apm_do_idle() slows the CPU down. Afterwards, current->need_resched is checked
and schedule() is called. Normally, nothing would happen in schedule() if
current->need_resched would not be set (correct me if I am wrong).
But there is the special case of an interrupt arriving in the first parts of
schedule() (which need much longer to execute due to the slower CPU) and put
a task on the runqueue. This task would get the CPU now and execute, while the
CPU was still slowed. That means there will be no more free cycles for some
time, the hard_idle()-loop is not scheduled again and cannot reset the CPU
to its original speed.

Voila, we have got a nice slow-down (I verified this by adding a field to
the /proc/apm output routine that showed if clock_slowed was set or not).

I do not know what happens if schedule() is called from here when APM is
not enabled, but it will cause run_task_queue(&tq_scheduler) to be called
fewer, which I do not think is good.

Anyway, could those of you experiencing the slow-downs, reboots etc. with
UP kernels try the following workaround (it will just avoid calling hard_idle()
completely) ?

Please report if there should be problems with this patch applied that seem
to fit in the UP flu description, for we then know that I were wrong.

Thanks in advance,
Philipp Rumpf (prumpf@jcsbs.lanobis.de)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/