Re: Linux 2.2.16pre6

From: Zdenek Kabelac (kabi@fi.muni.cz)
Date: Thu Jun 01 2000 - 05:16:14 EST


James Sutherland wrote:
>
> > Interesting. Does 2.2.15pre17 work reliably ?
>
> I've had occasional network lockups with my 8139, as well, on an SMP box.
> (Yes, a BP6...) There are a few syslog references indicating that the IRQ
> handler has been reentered (which Donald refers to in the source as "an
> x86 bug") - related??
>
> (My other BP6 box just refuses to talk to that 8139 at all... OTOH, it
> also refused to talk to a PCI-NE2k until I flashed in a new BIOS. And
> using the HPT366 driver, hdparm kills the box hard enough I have to reset
> the CMOS RAM...)

I'm using this hack - which prevents deadlock of my BP6 with RTLinux.
(For now the only deadlock I'm encountering are the results of my
stupind bugs in RTL :))

As author of the RTL states that he is pretty sure, that IRQ handling of
RTL
is correct I'm suspecting some bug somewhere else.
All I know is that this situation occures when I try to start few
huge applications over NFS - ispell with czech disctionary is my
favourite test.
Till yesterday netscape 4.72 was also quite realiable in getting
deadlock message,
however after upgrade to 4.73 the number of locks caused by runnning
netscape
reduced by 80%.
(So far I don't believe this is hardware bug in spinlock mechanism,
as this always happen when running something over NFS)

Anyway for now I'm happy with this hack:

--- linux.orig/arch/i386/kernel/irq.h Wed May 31 11:21:04 2000
+++ linux/arch/i386/kernel/irq.h Wed May 31 11:21:47 2000
@@ -138,8 +138,22 @@
 static inline void irq_enter(int cpu, unsigned int irq)
 {
        hardirq_enter(cpu);
- while (test_bit(0,&global_irq_lock)) {
- /* nothing */;
+ if (global_irq_holder == cpu && test_bit(0, &global_irq_lock)) {
+ printk(KERN_WARNING "irq_enter - CPU:%d already holder
(count:%
d, %d)!!!\n",
+ cpu, global_irq_count, local_irq_count[cpu]);
+ /* avoid deadlock bellow */
+ clear_bit(0,&global_irq_lock);
+ } else {
+ unsigned long i = 1000000;
+ while (test_bit(0,&global_irq_lock) && i) {
+ i--;
+ /* nothing */;
+ }
+ if (!i) {
+ clear_bit(0,&global_irq_lock);
+ printk(KERN_WARNING "irq_enter - loop timeout CPU:%d
Holder:%d!!!\n",
+ cpu, global_irq_holder);
+ }
        }
 }

-- this what I get in my message log --
 
May 31 21:28:59 dual kernel: irq_enter - CPU:0 already holder (count:2,
1)!!!
May 31 21:28:59 dual kernel: irq_enter - CPU:0 already holder (count:1,
1)!!!
May 31 21:28:59 dual kernel: irq_enter - CPU:0 already holder (count:2,
1)!!!

couple of times I've got even this message (once per two days I think)
May 29 13:04:39 dual kernel: irq_enter - loop timeout CPU:0 Holder:1!!!

--

Also I would like to note I'm running without noapic option on my BP6.

-- There are three types of people in the world: those who can count, and those who can't. Zdenek Kabelac http://i.am/kabi/ kabi@i.am {debian.org; fi.muni.cz}

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:12 EST