Re: [PATCH] mmc_spi: do propper retry managment in the block layer

From: Pierre Ossman
Date: Tue Apr 28 2009 - 15:57:46 EST

Next message: Greg KH: "Re: [RFC PATCH 00/21] amd64_edac: EDAC module for AMD64"
Previous message: Andrew Morton: "Re: mmotm 2009-02-17-12-33 uploaded"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 14 Apr 2009 16:27:26 +0200
Wolfgang MÃes <wolfgang.mues@xxxxxxxxxxxx> wrote:

> Hello Pierre,
>
> Am Samstag, 11. April 2009 schrieb Pierre Ossman:
> > NAK. Writes cannot be retried safely as upper layers rely on the fact
> > that writes fail in a linear manner (a stupid assumption IMO, but
> > that's the way things are).
>
> Hmmm.... so this patch has to be improved. I think the (original) code in
> mmc_blk_issue_rq() is somewhat questionable. As far as I understand the code,
> in the case of an error, brq.data.bytes_xfered is not examined, so the code
> starts over from the very first sector of the request, and rereads all
> sectors (one after the another).
>

That would be a bug, yes.

> So if there is a non-recoverable write error, this code must stop and do not
> continue to write, right?

Yes.

> But if there is a read error, it is OK and advisable to try to read the rest
> of the sectors?

We're working on the assumption that reads do not modify any state in
the card, so those should be safe to retry any number of times. :)

> If there is a write error in the middle of a request, is it OK to do
> __blk_end_request(req, 0, brq.data.bytes_xfered);
> and try to retry from this point, and stop if there is a non-recoverable write
> error?

Afraid not. The reason is that bytes_xfered is not guaranteed to be
exact. It gives you a lower bound, but not an upper one.

> > > + /* Invalid response. This is most likely a transmission
> > > + * error from card to host.
> > > + */
> > > + case -EINVAL:
> >
> > EINVAL is actually "host controller driver/hardware does not support
> > this type of request".
>
> "Hardware" is including the hardware of the SD card. This should NEVER happen,
> because block.c issues only valid requests, which are translated to mandatory
> commands for the mmc/sd card.
>
> So if there is an EINVAL result from the lower layers, it comes from
> transmission errors (card has not understand the command, or host has not
> understand the response).
>
> If EINVAL is coming from the HOST CONTROLER, a retry will not harm anyone.
>
> I have seen EINVAL results due to spikes on the SD lines.
>
> Please correct me if I am wrong.
>

You're wrong in that EINVAL should not be sent for any kind of runtime
hardware errors like that. But I might have missed some instances during
reviews, so there might be drivers that give that error message for
things they shouldn't.

From include/mmc/core.h:

* EINVAL Request cannot be performed because of restrictions
* in hardware and/or the driver

I can't really see a case where EINVAL should be a transient condition,
so there shouldn't be any point retrying a request that gives that
error.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.

Attachment: signature.asc
Description: PGP signature

Next message: Greg KH: "Re: [RFC PATCH 00/21] amd64_edac: EDAC module for AMD64"
Previous message: Andrew Morton: "Re: mmotm 2009-02-17-12-33 uploaded"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]