Re: [PATCH 1/3] scsi: fix potential dead lock for host runtime pm

From: Alan Stern
Date: Wed Nov 02 2011 - 10:41:53 EST


On Wed, 2 Nov 2011, Lin Ming wrote:

> In later patch hooks will be added to do ata port runtime pm through scsi layer.
> libata schedules scsi EH to handle suspend, then dead lock happens
> because scsi EH in turn waits for the ongoing suspend, as below.
>
> <scsi host runtime suspend>
> scsi_autopm_put_host
> pm_runtime_put_sync
> <scsi_host runtime pm status updated to RPM_SUSPENDING>
> ......
> <call libata hook to do suspend>
> <wake up scsi EH to handle suspend>
> <wait for scsi EH ...>
>
> <scsi EH wake up>
> scsi_error_handler
> <resume scsi host>
> scsi_autopm_get_host
> pm_runtime_get_sync
> .....
> <sleep to wait for the ongoing scsi host suspend>
>
> This patch fixes the dead lock by checking if there is ongoing runtime PM request.
> If there is ongoing runtime PM request, scsi_autopm_get_host_noresume is called to
> increase the usage count, but don't resume the host.
>
> Signed-off-by: Lin Ming <ming.m.lin@xxxxxxxxx>
> ---
> drivers/scsi/scsi_error.c | 4 +++-
> drivers/scsi/scsi_pm.c | 11 +++++++++++
> drivers/scsi/scsi_priv.h | 2 ++
> 3 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index a4b9cdb..d35d8f7 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1804,7 +1804,9 @@ int scsi_error_handler(void *data)
> * what we need to do to get it up and online again (if we can).
> * If we fail, we end up taking the thing offline.
> */
> - if (scsi_autopm_get_host(shost) != 0) {
> + if (scsi_autopm_host_busy(shost))
> + scsi_autopm_get_host_noresume(shost);
> + else if (scsi_autopm_get_host(shost) != 0) {
> SCSI_LOG_ERROR_RECOVERY(1,
> printk(KERN_ERR "Error handler scsi_eh_%d "
> "unable to autoresume\n",

Here's what I'm worried about: Suppose during normal operation, an
error occurs. When the command is completed, runtime PM might start to
suspend the host. But then the error-handler thread starts up, and
of course it needs the host to be powered-up in order to recover from
the error. The code you're adding could prevent this from working.

What we really need is a way to prevent host from going into runtime
suspend while the error-handler is pending or running. Do you have any
ideas on how to do this?

> diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
> index d82a023a..1be6c5a 100644
> --- a/drivers/scsi/scsi_pm.c
> +++ b/drivers/scsi/scsi_pm.c
> @@ -185,6 +185,17 @@ void scsi_autopm_put_host(struct Scsi_Host *shost)
> pm_runtime_put_sync(&shost->shost_gendev);
> }
>
> +bool scsi_autopm_host_busy(struct Scsi_Host *shost)
> +{
> + return (shost->shost_gendev.power.runtime_status == RPM_RESUMING
> + || shost->shost_gendev.power.runtime_status == RPM_SUSPENDING);
> +}
> +
> +void scsi_autopm_get_host_noresume(struct Scsi_Host *shost)
> +{
> + pm_runtime_get_noresume(&shost->shost_gendev);
> +}
> +
> #else
>
> #define scsi_runtime_suspend NULL
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 2a58895..1750651 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -156,6 +156,8 @@ extern void scsi_autopm_get_target(struct scsi_target *);
> extern void scsi_autopm_put_target(struct scsi_target *);
> extern int scsi_autopm_get_host(struct Scsi_Host *);
> extern void scsi_autopm_put_host(struct Scsi_Host *);
> +extern bool scsi_autopm_host_busy(struct Scsi_Host *shost);
> +extern void scsi_autopm_get_host_noresume(struct Scsi_Host *);
> #else
> static inline void scsi_autopm_get_target(struct scsi_target *t) {}
> static inline void scsi_autopm_put_target(struct scsi_target *t) {}

I bet you didn't try compiling this with CONFIG_PM_RUNTIME turned off.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/