2.0.36: General protection in interrupt handler - race condition

Ion Badulescu (ionut@moisil.cs.columbia.edu)
Sat, 20 Feb 1999 17:07:47 -0500 (EST)


Hi,

I'm getting quite frequent lock-ups on a box running 2.0.36, and I've
finally managed to get an Oops report using remote syslogging. Here it
comes:

kernel: general protection: 0000
kernel: CPU: 0
kernel: EIP: 0010:[kmalloc+194/516]
kernel: EFLAGS: 00010006
kernel: eax: 000007f8 ebx: 0351d000 ecx: 64782d78 edx: 00000001
kernel: esi: 001d9a5c edi: 001d9a5c ebp: 00000000 esp: 03c25ee8
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process rpc.nfsd (pid: 123, process nr: 20, stackpage=03c25000)
kernel: Stack: 0008a0cc 001e44f4 00010098 000005fc 00000006 03c2b810 0000008c 001d9a5c
kernel: ffaa0055 00000246 0013b1fc 00000694 00000001 0008a0cc 001e44f4 00010098
kernel: 001e44f4 03b5e618 0013b607 00000694 00000001 0019449d 000005ec 0008a0cc
kernel: Call Trace: [alloc_skb+100/332] [dev_alloc_skb+15/40] [hp100_build_rx_pdl+21/188]
[hp100_rx_bm+261/316] [hp100_interrupt+125/348] [do_IRQ+101/136] [IRQ11_interrupt+95/144]
kernel: Code: 81 39 aa ff 55 00 0f 85 26 01 00 00 8b 41 04 89 43 04 8b 43
kernel: Aiee, killing interrupt handler

ksymoops says:
Code: 00000000 <_EIP>:
Code: 0 <_EIP+0>: 81 39 aa ff 55 cmpl $0x55ffaa,(%ecx)
Code: 5 <_EIP+5>: 00
Code: 6 <_EIP+6>: 0f 85 26 01 00 jne 132 <_EIP+132>
Code: 11 <_EIP+11>: 00
Code: 12 <_EIP+12>: 8b 41 04 movl 0x4(%ecx),%eax
Code: 15 <_EIP+15>: 89 43 04 movl %eax,0x4(%ebx)
Code: 00000000 <_EIP>:
Code: 0 <_EIP+0>: 81 39 aa ff 55 cmpl $0x55ffaa,(%ecx)
Code: 5 <_EIP+5>: 00
Code: 6 <_EIP+6>: 0f 85 26 01 00 jne 132 <_EIP+132>
Code: 11 <_EIP+11>: 00
Code: 12 <_EIP+12>: 8b 41 04 movl 0x4(%ecx),%eax
Code: 15 <_EIP+15>: 89 43 04 movl %eax,0x4(%ebx)

And finally, the C code that generated the above assembly (from kmalloc())
is:

p = page->firstfree;
if (p->bh_flags != MF_FREE)
goto not_free_on_freelist;

So page->firstfree (%ecx) is total garbage -- ascii 'x-xd' -- and of
course the next dereferencing says 'boom'.

The box is a P5/100, with 64MB RAM, chipset Intel VX, scsi adapter
aic7xxx, network card HP100VG, and one Cyclom-Y/ISA multi-serial card.
It is being used as terminal server (ppp access) and mail server. One
particularity of this box is that the interrupt rate is fairly high, due
to the almost fully used cyclades board. I'm talking about over 1000
interrupts per second.

The only card using IRQ 11 is the HP100.

Any ideas?

Thanks,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/