Re: [PATCH] mmc_spi: do propper retry managment in the block layer- 3rd try

From: Pierre Ossman
Date: Tue Jun 30 2009 - 10:14:38 EST


(I haven't compared to your latest versions some some of my comments
here might not be relevant for the existing patch)

On Thu, 28 May 2009 10:28:29 +0200
Wolfgang MÃes <wolfgang.mues@xxxxxxxxxxxx> wrote:

> Am Mittwoch, 27. Mai 2009 schrieb Pierre Ossman:
> > FYI, the situation that we must avoid to not screw up the assumptions
> > of the filesystems is to end up with a situation like this on the card:
> >
> > NNNNBNNNOOOO
> >
> > N = block with new data
> > O = block with previous data
> > B = block with damaged data
> >
> > This could in theory happen when the card has some internal problem
> > (e.g. bad flash cell) as well go through these states:
> >
> > NNNNNNNNOOOO (first write, response to stop command is lost)
> > NNNNBNNNOOOO (second write, card has internal problem halfway through)
>
> For SPI and for any controller which gives you a correct bytes_xfered, this
> will only be NNNNNNNNNNNNBOOOOOOOOOOO, because after each transfered sector,
> you get a response from the card. So you will not start from the beginning
> any more.
>
> If a host controller does not give a correct bytes_xfered, you might end in
> the scenario you have pointed out, only after exhausting all retries. This is
> equivalent to a bad of multiple sectors. NBNBNBNB000 == NBBBBBBBOOO.
>

Hmm... unfortunately this is the norm as I don't think any of the other
drivers provide a proper bytes_xfered. That also means that the verdict
for this patch is NAK for now.

> I think we should code according to the expectations of the user.
> A nonrecoverable write results in a data loss for the user. And I think that
> the wish of the user to get the newly aquired data down to the card has
> preceedence against the possible loss of some old data on the card. Remember
> that the "old data" will be unused sectors most of the time. In embedded
> systems, the user often has no chance to repeat the storage process or choose
> another medium, and the data ist lost if the write to SD card fails.
>

The argument was that this method probably worked fine for classical
file systems, but it wrecked havoc if it was done in the log area of
journaling file systems.

I'd like to reiterate that I like the idea and I really think the
kernel should be able to do this. If you can get the VFS and block layer
people to look at this and sign off on it, then we can move forward and
include this. I don't hive time for that particular crusade myself though.

> >
> > I think we can keep the logic where it uses single blocks for the
> > remainder of the request.
>
> But this will increase wear leveling problems for the rest of the request.
> And this is for a mean value of 64 sectors!
>
> Why on earth should we do this if it is so easy to avoid?
>
> As transmission errors are seldom, we get a request splitted out in 3 blocks:
> n blocks before the error, one single-block transfer, m blocks after the
> error. And you almost will not notice a slowdown in transmission speed.
>

That code was not written for transmission errors but for media errors.
That means that if block 8 is the bad one, your method (provided
bytex_xfered is useless, which is the norm) means that it will transfer
n blocks, 1 block, n-1 blocks, 1 block, etc. failing every second
attempt until it gets to the actually damaged sector.

> >
> > Perhaps this fix can be moved into its own patch? I think it might keep
> > things cleaner.
>
> Which fix? The code was only moved and restructured. No new behaviour.
>

Never mind, I wasn't paying proper attention. :)

> > > +
> > > + /* Wait until card is ready for new data. This information is
> > > + * only available in non-SPI mode. In SPI mode, the busy
> > > + * handling is done in the SPI driver.
> > > + */
> >
> > Strictly speaking it should be handled in all drivers. Some hardware
> > doesn't do it properly though so we have this safeguard.
>
> IMHO no driver and framework is doing this correct. It should not be done
> after a write request, but before the _next_ request. All I have done here
> was to add some documentation about SPI, and leave the previous code intact.

Why on the next request? For performance reasons? The downside to that
is increased complexity. The current system means that the card is
always in an idle state after each request.

(the hardware handling also gets trickier as the host controller also
is kept in some "active" state until busy ends)

> > > + /* Do retries for all sort of transmission errors */
> > > + switch (error) {
> > >
> > > - /*
> > > - * A block was successfully transferred.
> > > + case 0: /* no error: continue, reset error variables */
> > > + disable_multi = 0;
> > > + retries = 3;
> > > + break;
> > > +
> > > + /* Card has not understand command. As we do only send
> > > + * valid commands, this must be a transmission error. */
> > > + case -EPROTO: /* fall through */
> >
> > This indicates a layering problem. The host driver should not be aware
> > of anything but pure bit errors.
> >
> > Also, this special meaning should be documented in core.h should we
> > decide to keep it.
>
> This is not MY error code. EPROTO is send from mmc_spi_writeblock(), and I
> have listed it here because it is a transmission error.
>
> So there seems to be contrary objections: Matt Flemming has requested that the
> exact cause of error is reported by the driver (because otherwise the caller
> will loose information), and you requested to distinguish only between
> transmission errors and the rest.
>
> Matt Flemming has pointed out that further changes in the code will request to
> get the exact cause of error in the block layer, and that error codes might
> have interpreted different according to the command class, so IMHO it is
> better to transport the exact error cause into block.c and do the error
> handling here, according to the type of request.
>

Right. As much as possible of the result should be parsed by the same
layer that assembles the request. So I think EPROTO should be left out
here and we can look at getting better integration with mmc_spi
responses in another patch.

> >
> > This looks wrong. What happened to the original logic of retrying reads
> > on any error?
>
> You mean error codes like EINVAL, ENOMEDIUM, ENODEV?
> I there any sense in retrying non-recoverable errors? Or should I list the
> non-recoverable errors inside the switch (and do nothing), and do a retry for
> read calls on any other error code, so a new coded driver with a new error
> code will get a read retry (but no write retry!) for this new code regardless
> of making sense?
>

Yes. E.g. some limited hardware might not be able to report a timeout
properly so the driver will report EIO for many types of errors. We
should be robust enough to handle that.

> > And chunk seems unrelated to the rest of the patch.
>
> Ah... no. This is part of sending the STOP command in case of an error.
> And sending the STOP command in case of an error is needed for a working retry
> of write commands in SPI mode. Without a STOP command, you can not retry a
> broken write sequence because the card missed the STOP token.
>

Ok, "unrelated" might be a bit strong. But it isn't necessary to commit
them together. For the sake of keeping the patch small, I think this
should be a separate patch which will be committed before the retry
changes.

> > Could you also move those modifications to its own patch?
>
> Why? It is meaningless without a retry logic for block writes.
>

See above. Reduces the size of this patch.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.

Attachment: signature.asc
Description: PGP signature