Re: Linux 2.1.118 SMP problem

Alan Cox (alan@lxorguk.ukuu.org.uk)
Wed, 26 Aug 1998 23:36:15 +0100 (BST)


> wait_on_bh, CPU 0:
> irq: 1 [0 1]
> bh: 1 [0 1]
> <[c0113c4f]> <[c0175342]> <[c0175424]> <[c0148761]>
>
> repeating every few seconds.
>
> System.map says:
>
> del_timer __rpc_wake_up rpc_wake_up_task nfs_updatepage

Ok I've chased through the lock_sock(). That appears to be merely
broken. Because its not a spinlock it wouldnt actually do damage
here and by luck rather than anything else it is actually going
to be safe for UDP

Please try the following (untested) hack

This should give you a similar dump for each CPU on the machine
so we can find out where the deadlocked CPU is. Providing it
wants to play anyway.

(Ingo perhaps we should have a clean version of this in for real ?)

--- linux.vanilla/arch/i386/kernel/irq.c Fri Aug 7 17:20:48 1998
+++ linux/arch/i386/kernel/irq.c Wed Aug 26 22:39:13 1998
@@ -342,7 +342,7 @@
}
}

-static void show(char * str)
+void show(char * str)
{
int i;
unsigned long *stack;
@@ -371,6 +371,7 @@
do {
if (!--count) {
show("wait_on_bh");
+ panic("wait_on_bh_debug");
count = ~0;
}
/* nothing .. wait for the other bh's to go away */
--- linux.vanilla/arch/i386/kernel/smp.c Wed Aug 19 14:23:08 1998
+++ linux/arch/i386/kernel/smp.c Wed Aug 26 22:38:42 1998
@@ -1541,6 +1541,8 @@
*/
asmlinkage void smp_stop_cpu_interrupt(void)
{
+ printk("Halting CPU %d\n", smp_processor_id());
+ show("CPU stop");
if (cpu_data[smp_processor_id()].hlt_works_ok)
for(;;) __asm__("hlt");
for (;;) ;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html