[idiotic patch, RFH] qlogicfc error "this should not happen", again

From: Eric Weigle (ehw@lanl.gov)
Date: Mon Dec 23 2002 - 15:09:31 EST


Hi-

Back in August, A couple people (at least rwhron@earthlink.net and myself)
were getting lock-ups with the qlogic driver in 2.5.31; a patch was posted
that apparently fixed rwhron's problems. While it decreased mine from crashing
1-2 times a week to once a month, it didn't solve them. Over the past four
months I've still had three hard lock-ups. All occur under high load--
weekly mirroring of Debian archive for LANL. I've CCed him on this message,
maybe he'll have comments...

Any advice on how to debug this from the kernel gurus? Somewhere we seem to
be leaking handles.

The kernel is 2.5.44, but there have been no significant changes to any
of the qlogic files between .44 and .52. The memory in the box is solid
(memtest86) and the disks have never given me any errors. Unfortunately,
I don't know how to do any diagnostics on the qlogic card...

Right now, I just added the following patch because otherwise I have to come
in and physically power cycle the box to bring it back up. Even with ext3,
that's a bad thing.

================================================================================
--- ./qlogicfc.c.20021223 Mon Dec 23 12:32:57 2002
+++ ./qlogicfc.c Mon Dec 23 12:38:11 2002
@@ -1135,6 +1135,7 @@
  * interrupt handler may call this routine as part of
  * request-completion handling).
  */
+extern int panic_timeout;
 int isp2x00_queuecommand(Scsi_Cmnd * Cmnd, void (*done) (Scsi_Cmnd *))
 {
         int i, sg_count, n, num_free;
@@ -1228,6 +1229,13 @@
                                 printk("slot %d has %p\n", i, hostdata->handle_ptrs[i]);
                         }
                 }
+ /* When we hit this, the machine locks up hard anyway.
+ * Big red button is the only fix.
+ * Panic & reboot; we've got nothing to lose. */
+ panic_timeout=10;
+ panic("Hard lock-up 'avoided' via percussive maintenance (big sledgehammer mode)\n");
+
+ /* never reached */
                 return 1;
         }
================================================================================

References:
    (The original thread)
    http://www.uwsg.iu.edu/hypermail/linux/kernel/0208.2/1264.html
    (The patch that doesn't work for me)
    http://www.ussg.iu.edu/hypermail/linux/kernel/0209.0/0467.html

Thanks,
-Eric

-- 
------------------------------------------------------------
        Eric H. Weigle -- http://public.lanl.gov/ehw/
"They that can give up essential liberty to obtain a little
temporary safety deserve neither" -- Benjamin Franklin
------------------------------------------------------------


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Dec 23 2002 - 22:00:32 EST