RE: sata_mv port lockup on hotplug (kernel 2.6.38.2)

From: Bruce Stenning
Date: Tue Sep 06 2011 - 08:19:57 EST


> Can you please add some debug printk's to scsi_schedule_eh() and see
> whether scsi_eh_wakeup() is invoked from there? It seems likely that
> the problem is caused by race conditions around
> SHOST_[CANCEL_]RECOVERY flags.

I did manage to reproduce the lockup again yesterday with a slightly
different mix of tracing, including adding tracing to scsi_eh_wakeup()
and scsi_schedule_eh(). It looks like the EH is being scheduled, but
the EH thread goes immediately back to sleep and doesn't wake up:

ata4: EH complete
Waking error handler thread
scsi_eh_wakeup: succeeded
scsi_schedule_eh: succeeded
scsi_restart_operations: waking up host to restart
Error handler scsi_eh_3 sleeping

Is it attempting to wake the scsi_eh_3 thread while scsi_error_handler
is still processing an EH, which then calls scsi_restart_operations and
puts the scsi_eh_3 thread back to sleep again?

Some while after the lockup, there was some tracing relating to SCSI
operations timing out, but the port was still unresponsive. The unit
is not entirely stable in this state, and our application software was
no longer able to strobe softdog, so the unit rebooted. Enough was
running for the serial console to be responsive before the reboot,
however.

Thanks,

Bruce.


Latest News at: http://www.indigovision.com/index.php/en/news.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/