Re: SMP _death_

Richard B. Johnson (root@analogic.com)
Mon, 28 Apr 1997 11:50:52 -0400 (EDT)


On Mon, 28 Apr 1997, David S. Miller wrote:

> Date: Mon, 28 Apr 1997 08:45:49 -0400 (EDT)
> From: "Richard B. Johnson" <root@analogic.com>
>
> The following patch seems to fix the SMP death problem on my machine.
>
> --- /usr/src/linux-2.1.36/arch/i386/kernel/irq.c.orig Mon Apr 28 01:31:43 1997
> +++ /usr/src/linux-2.1.36/arch/i386/kernel/irq.c Mon Apr 28 08:34:36 1997
> @@ -543,6 +543,7 @@
> {
> struct irqaction * action;
> int do_random, cpu = smp_processor_id();
> + synchronize_irq();
> irq_enter(cpu, irq);
> kstat.interrupts[irq]++;
>
> Although this can't possibly be the right fix, it can hint us as to
> where it really is. All this patch does is single thread all
> interrupt handling, which means there is a re-entrancy problem in some
> driver still which has yet to be resolved.

Well I can even tell you the driver. However, the problem will persist
for all other drivers unless they are rewritten -- and I'm sure that
nobody wants to do that. I am quite aware what the patch does. Further
I don't know why there should be such a problem with interrupts in the
first place if the interrupt handling was more "appropriate". The problem
is that there are many drivers that have interrupt service routines that
would have to be rewritten.

It is possible to have multiple interrupt service routines executing on
multiple CPUs if the interrupt handling was modified. The modification
would involve:

(1) Every ISR must check its hardware status and do nothing if
there is nothing to do, i.e., not complain about "spurious"
interrupts. It will simply return to the caller. The ISR
for each of the hardware devices is called from a global
interrupt handler. It never manipulates the interrupt
controllers and it leaves the interrupt flags alone.

(2) A global interrupt handler would keep track of the interrupts
in queue to each of the handlers. It would not allow reentry
but would "remember" pending interrupts. When an ISR returns
to the global interrupt handler, the global handler will
call it again if another interrupt on that level occurred
during the execution of the previous. It doesn't know nor
care what CPU is actually executing at this instant.

Implementation is a simple counter for each of the IRQs. The
global handler will, upon entry, call the ISR for the respective
IRQ level if, and only if, it is zero. It will increment the
counter before calling, and decrement it upon return.

If a new interrupt occurs for the same level, before the
previous has returned, i.e., the count is non-zero. The count
will be incremented only.

Upon return from the ISR for each level, the respective counter
will be decremented. If it is not zero, the ISR will be called
again without incrementing the count. This way, no interrupts are
ever lost. The CPU that is actually executing is unknown and
nobody cares.

(3) The global interrupt handler is the only mechanism by which
the two interrupt controllers are manipulated. It is also the
only procedure that plays with the interrupt flags. Each called
ISR knows that its data will be stable during its execution
because it knows that the global interrupt handler will not
allow reentry.

There are exceptions when data may be modified by another ISR
which may be executing concurrently, however these are handled
on a case-by-case basis.

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.35 on an i586 machine (66.15 BogoMips).
Warning : I read unsolicited mail for $350.00 per hour. Supply billing address.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-