Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands

From: Bart Van Assche
Date: Sun Jan 01 2017 - 09:38:09 EST


On Sat, 2016-12-31 at 15:19 -0800, James Bottomley wrote:
> On Thu, 2016-12-29 at 00:02 -0800, Christoph Hellwig wrote:
> > On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
> > > Add a new parameter to scsi_internal_device_block() to decide
> > > whether or not to invoke scsi_wait_for_queuecommand().
> >
> > We'll also need to deal with the blk-mq wait path that Bart has been
> > working on (I think it's already in the scsi tree, but I'd have to
> > check).
> >
> > Also adding a bool flag for the last call in a function is style
> > that's a little annoying.
> >
> > I'd prefer to add a scsi_internal_device_block_nowait that contains
> > all the code except for the waiting, and then make
> > scsi_internal_device_block_nowait a wrapper around it. Or drop the
> > annoying internal for both while we're at it :)
>
> OK, I know it's new year, but this is an unpatched regression in -rc1
> that's causing serious issues. I would like this fixed by -rc3 so we
> have 3 options
>
> 1. revert all the queuecommand wait stuff until it proves it's actually
> working without regressions
> 2. apply this patch and fix the style issues later
> 3. someone else supplies the correctly styled fix patch
>
> The conservative in me says that 1. is probably the most correct thing
> to do because it gives us time to get the queuecommand wait stuff
> right; that's what I'll probably do if there's no movement next week.
> However, since we're reasonably early in the -rc cycle, so either 2 or
> 3 are viable provided no further regressions caused by the queuecommand
> wait stuff pop up.

Hello James,

My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
secure erase premature termination"). Since the mpt3sas driver uses the
single-queue approach and since the SCSI core unlocks the block layer
request queue lock before the .queuecommand callback function is called,
multiple threads can execute that callback function (scsih_qcmd() in this
case) simultaneously. This means that using scsi_internal_device_block()
from inside .queuecommand to serialize SCSI command execution is wrong.

Bart.