Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37

From: Nicholas A. Bellinger
Date: Wed Oct 27 2010 - 16:07:05 EST


On Wed, 2010-10-27 at 12:20 -0700, Mike Anderson wrote:
> Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote:
> > On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
> > > On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > > > This sounds like a pretty reasonable compromise that I think is slightly
> > > > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > > > them.
> > > >
> > > > They won't get tested either next release cycle. Essentially
> > > > near nobody uses them.
> > > >
> > > > >
> > > > > What do you think..?
> > > >
> > > > Standard linux practice is to simply push the locks down. That's a pretty
> > > > mechanical operation and shouldn't be too risky
> > > >
> > > > With some luck you could even do it with coccinelle.
> > >
> > > Precisely ... if we can do the push down now as a mechanical
> > > transformation we can put it in the current merge window as a low risk
> > > API change.
> >
> > I disagree that touching every single legacy LLD's SHT->queuecommand()
> > and failure paths in that code is a low rist change.
> >
> > > This gives us optimal exposure to the rc sequence to sort
> > > out any problems that arise (or drivers that got missed) with the lowest
> > > risk of such problems actually arising.
> >
> > Yes,
> >
> > > Given the corner cases and the
> > > late arrival of fixes, the serial number changes are just too risky for
> > > the current merge window.
> >
> > I think with andmike's testing and ACKs for the necessary scsi_error.c
> > changes this would be an acceptable risk.
> >
>
> Adding SCSI_EH_SOFTIRQ_DONE in scsi_softirq_done is not going to provide
> value in scsi_try_to_abort_cmd. scsi_softirq_done calls scsi_eh_scmd_add
> without the SCSI_EH_CANCEL_CMD flag set which will stop
> scsi_try_to_abort_cmd from being called.
>
> Removing the serial_number check in scsi_try_to_abort_cmd and not
> replacing it may be the correct action as we should be relying on the
> block complete checking. That said what James has indicated about
> splitting the serial number change out seems like the lower risk approach
> at this time.
>

Hmm, that is unfortuate..

So in this case it would make sense to drop the explict LLD usage of
scsi_cmd_get_serial(), and re-include this into scsi_dispatch_cmd() for
all LLDs and have to deal with a per scsi_host atomic_t serial_number
counter. Anyways, I will go ahead an respin another series to follow
this logic shortly.

The other question that was mentioned in my email yesterday would be if
the clearing of a non atomic_t cmd->serial_number from
scsi_softirq_done() -> scsi_try_to_abort_cmd() is safe to begin with..?
Does this need to be converted to an atomic_t as well to present a
subtle race outside of any of the host_lock-less series of patches..?

--nab



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/