Re: test10 hangs on startup: NMI watchdog hits Adaptec driver

From: James Bottomley
Date: Mon Nov 24 2003 - 19:41:49 EST


On Mon, 2003-11-24 at 18:23, Peter Chubb wrote:
> I've been seeing random hangs on a dual 500MHz celeron here; so I
> rebooted this morning with the NMI watchdog turned on.
>
> With the watchdog, the machine shows the attached. Looks to me as if
> the lock taken at aic7xx_osm.c:1709 which is released *after*
> ahc_linux_initialize_scsi_bus() should perhaps be released earlier.
> Otherwise the host lock is held for the duration.

There have been several threads on this.

The fix is attached.

James

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1483 -> 1.1484
# drivers/scsi/scsi_error.c 1.65 -> 1.66
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/11/24 jejb@xxxxxxxxxxxxxxxxxxxxx 1.1484
# Fix locking problems in scsi_report_bus_reset() causing aic7xxx to hang
#
# All the users of this function in the SCSI tree call it with the host
# lock held. With the new list traversal code, it was trying to take
# the lock again to traverse the list.
#
# Fix it to use the unlocked version of list traversal and modify the
# header comments to make it clear that the lock is expected to be held
# on calling it.
# --------------------------------------------
#
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c Mon Nov 24 17:27:38 2003
+++ b/drivers/scsi/scsi_error.c Mon Nov 24 17:27:38 2003
@@ -911,7 +911,9 @@

if (rtn == SUCCESS) {
scsi_sleep(BUS_RESET_SETTLE_TIME);
+ spin_lock_irqsave(scmd->device->host->host_lock, flags);
scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
+ spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
}

return rtn;
@@ -940,7 +942,9 @@

if (rtn == SUCCESS) {
scsi_sleep(HOST_RESET_SETTLE_TIME);
+ spin_lock_irqsave(scmd->device->host->host_lock, flags);
scsi_report_bus_reset(scmd->device->host, scmd->device->channel);
+ spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
}

return rtn;
@@ -1608,7 +1612,7 @@
*
* Returns: Nothing
*
- * Lock status: No locks are assumed held.
+ * Lock status: Host lock must be held.
*
* Notes: This only needs to be called if the reset is one which
* originates from an unknown location. Resets originated
@@ -1622,7 +1626,7 @@
{
struct scsi_device *sdev;

- shost_for_each_device(sdev, shost) {
+ __shost_for_each_device(sdev, shost) {
if (channel == sdev->channel) {
sdev->was_reset = 1;
sdev->expecting_cc_ua = 1;
@@ -1642,7 +1646,7 @@
*
* Returns: Nothing
*
- * Lock status: No locks are assumed held.
+ * Lock status: Host lock must be held
*
* Notes: This only needs to be called if the reset is one which
* originates from an unknown location. Resets originated
@@ -1656,7 +1660,7 @@
{
struct scsi_device *sdev;

- shost_for_each_device(sdev, shost) {
+ __shost_for_each_device(sdev, shost) {
if (channel == sdev->channel &&
target == sdev->id) {
sdev->was_reset = 1;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/