Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

From: Boris Brezillon
Date: Sun Jun 12 2016 - 17:13:38 EST


On 12 Jun 2016 16:24:53 -0400
"George Spelvin" <linux@xxxxxxxxxxxxxxxxxxx> wrote:

> Boris Brezillon wrote:
> > On 12 Jun 2016 08:25:49 -0400
> > "George Spelvin" <linux@xxxxxxxxxxxxxxxxxxx> wrote:
> >> (In fact, an interesting
> >> question is whether bad pages should be skipped or not!)
> >
> > There's no such thing. We have bad blocks, but when a block is bad all
> > the pages inside this block are considered bad. If one of the page in a
> > valid block shows uncorrectable errors, UBI/UBIFS will just refuse to
> > attach the partition/mount the FS.
>
> Ah, okay. I guess dealing with inconsistently-sized blocks is too much
> hassle. And a block has a single program/erase cycle count, so if one
> part is close to wearing out, the rest is, too.
>
> P.S. interesting NASA study of (SLC) flash disturb effects:
> http://nepp.nasa.gov/DocUploads/9CCA546D-E7E6-4D96-880459A831EEA852/07-100%20Sheldon_JPL%20Distrub%20Testing%20in%20Flash%20Mem.pdf?q=disturb-testing-in-flash-memories

Thanks for the link.

>
> One thing they noted was that manufacturers' bad-blocck testing sucked,
> and quite a few "bad" blocks became good and stayed good over time.
>
> >> Given that, very predictable writer ordering, it would make sense to
> >> precompensate for write disturb.
> >
> > Yes, that's what I assumed, but this is not clearly documented.
> > Actually, I discovered that while trying to solve the paired pages
> > problem (when I was partially programming a block, it was showing
> > uncorrectable errors sooner than the fully written ones).
>
> Were the errors in a predictable direction? My understanding is that
> write disturb tends to add a little extra charge to the disturbed
> floating gates (i.e. write them more toward 0), so you'd expect
> to see extra 1s if the chip was underprogramming in antiipation.
>
> I'm also having a hard time figuring out the bit assignment.
> In general, "1" means uncharged floating gate and "0" means charged,
> but different sources show different encodings for MLC.
>
> Some (e.g. the NASA report above) show the progression from erased to
> programmed as
>
> 11 - 10 - 01 - 00
>
> so the msbit is a "big jump" and the lsbit is a "small jump", and to
> program it in SLC mode you'd program both pages identically, then read
> back the msbit.
>
>
> Others, e.g.
> http://users.ece.cmu.edu/~omutlu/pub/flash-programming-interference_iccd13.pdf
> suggest the order is
>
> 11 - 10 - 00 - 01
>
> This has the advantage that a 1-level mis-read only produces a 1-bit
> error.
>
> But in this case, to get SLC programming, you program the lsbit as
> all-ones.
>
> My problem is that I don't really understand MLC programming.

I came to the same conclusion: we really have these 2 cases in the
wild, which makes it even more complicated to define a standard
behavior.

>
>
> >>> [2]http://www.szyuda88.com/uploadfile/cfile/201061714220663.pdf
> >>
> >> Did you see the footnote at the bottom of p. 64 of the latter?
> >> Does that affect your pair/group addressing scheme?
> >>
> >> It seems they are grouping not just 8K pages into even/odd double-pages,
> >> and those 16K double-pages are being addressed with stride of 3.
> >>
> >> But in particular, an interrupted write is likely to corrupt both
> >> double-pages, 32K of data!
> >
> > Yes, that's yet another problem I decided to ignore for now :).
> >
> > I guess a solution would be to consider that all 4 pages are 'paired'
> > together, but this also implies considering that the NAND is a 4-level
> > cells, which will make us loose even more space when operating in 'SLC
> > mode' where we only write the lower page (page attached to group 0) of
> > each pair.
>
> It's more considering it to have 16K pages that can be accessed in half-pages.

Yes, I know, but it's not really easy to fake that at the NAND level,
because programming 2 pages still requires 2 page program operation.
The MTD user could detect that the pairing scheme always exposes 2
consecutive non-paired pages, but as you've seen, this condition does
not necessarily imply the 'pair coupling' constraint, and we don't want
to increase the min_io_size value if it's not really necessary.


--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com