Re: qla2xxx: frequent total lockups (2.6.8, 2.6.9-rc{1-mm5,2})

From: Oliver M. Bolzer
Date: Fri Sep 24 2004 - 09:29:06 EST



On Thu, Sep 16, 2004 at 01:16:59AM +0200, "Oliver M. Bolzer" <oliver@xxxxxxxxxxxx> wrote...

>As soon as there is I/O load on the HBA, I start seeing
>
>qla2300 0000:01:03.0: qla2xxx_eh_abort: cmd already done sp=0000000000000000
>
>messages every few seconds, eventually leading to complete system lockups after
>several minutes up to several hours, due to kernel NULL pointer dereferences
>in qla2x00_cmd_timeout().

> I've tested and reproduced the error on the following kernels, all
> compiled for x86_64.
> 2.6.8.1
> 2.6.9-rc1-mm5 (with dma_fixups patch posted by Andrew Vasquez on 13.9)
> 2.6.9-rc2

2.6.9-rc2-mm2 seems to have fixed the problem.
It is rather puzzling since the difference in the qla2xxx driver between
2.6.9-rc1-mm5+qlogic-oops-fix and 2.6.9-rc2-mm2 is rather minimal. Maybe
the problem was caused by some other part of the kernel that has now
been fixed.



--
Oliver M. Bolzer
oliver@xxxxxxx

GPG (PGP) Fingerprint = 621B 52F6 2AC1 36DB 8761 018F 8786 87AD EF50 D1FF
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/