Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

From: Boris Brezillon
Date: Sun Jun 12 2016 - 07:11:55 EST


On 12 Jun 2016 05:23:13 -0400
"George Spelvin" <linux@xxxxxxxxxxxxxxxxxxx> wrote:

> >> It also applies an offset of +1, to avoid negative numbers and the
> >> problems of signed divides.
>
> > It seems to cover all cases.
>
> I wasn't sure why you used a signed int for the interface.

No real reason other than consistency with other prototypes where page
is always expressed as an integer.

>
> (Another thing I thought of, but am less sure of, is packing the group
> and pair numbers into a register-passable int rather than a structure.
> Even 2 bits for the group is probably the most that will ever be needed,
> but it's easy to say the low 4 bits are the group and the high 28 are
> the pair. Just create a few access macros to pull them apart.

We could indeed do that, but again, do we really need to optimize
things like that?

>
> This was inspired by Linus's hash_len abstraction, recently moved to
> <linux/stringhash.h>)
>
> >> (or you could add an mtd->write_per_erase field).
>
> > Okay. Actually I'd like to avoid adding new 'conversion' fields to the
> > mtd_info struct. Not sure we are really improving perfs when doing that,
> > since what takes long is the I/O ops between the flash and the
> > controller not the conversion operations.
>
> Well, yes, but you may need to do conversion ops for in-memory cache
> lookups or searching for free blocks, or wear-levelling computations,
> all of which may involve a great many conversions per actual I/O.

That's true, even if I don't think it makes such a big difference (you
don't have that much paired pages manipulation that are not followed by
read/write accesses, and this is where the contention is).

>
> (In hindsight, I'd wish for writesize and write_per_erase, and not
> store erasesize explicitly. Not only is the multiply more efficient,
> but it abolishes the error of an erase size which is not a multiple of
> the write size by making it impossible.)

That's also true. Actually I was thinking about adding inline functions
to retrieve the eraseblock and page size instead of letting people
manipulate the ->writesize/erasesize fields. This way we would be able
to rework the internal representation.

>
> > Can we have a boolean to make it clearer?
> >
> > bool lastpage = ((page + 1) * mtd->writesize) == mtd->erasesize;
>
> An improvement IMHO. You can use the same name in all four functions
> to make the equivalence clear.
>
> > Also, the page update is quite obscure for people that did not have the
> > explanation you gave above. Can we make it
>
> > /*
> > * The first and last pages are not surrounded by other pages,
> > * and are thus less sensitive to read/write disturbance.
> > * That's why NAND vendors decided to use a different distance
> > * for these 2 specific case, which complicates a bit the
> > * pairing scheme logic.
>
> Um... this is, as far as I can tell, complete nonsense.

Actually this was pure guessing, cause I never had a real explanation
for these weird pairing scheme.

>
> I realize you know this about a thousand times better than I do, so
> I'm hesitant to make such a strong statement, but one thing that I do
> know is that paired pages are stored in the exact same transistors.
> The pairing is purely a logical addressing distance. The physical
> distance is exactly zero.
>
> The qustion is why they chose this particular *logival* addressing
> scheme, and I believe the reason is write bandwidth for the common case
> of streaming writes to consecutive pages.
>
> The obvious thing to do is pair consecutive even and odd pages (pages 0 and 1,
> then 2 and 3, then...), but that makes it hard to pipeline programming of the
> two pages. You can't start programming page 1 until page 0 is finished.
>
> The next obvious thing is stride-2: 0<->2, 1<->3, 4<->6, 5<->7, etc.

Yes I understand that one.

>
> This is done in some MLC chips. See p. 98 of this Micron data sheet:
> http://pdf.datasheet.directory/datasheets-0/micron_technology/MT29F32G08CBACAWP_C.pdf
> which has a stride-4 pairing. 0..3 pair with 4..8, then 9..11 with
> 12..15, and so on.
>
> However, it's desirable to alternate group-0 and group-1 pages, since
> the write operations are rather different and even take different amounts
> of time. Alternating them makes it possible to:
> 1) Possibly overlap parts of the writes that use different on-chip resources,
> 2) Average the non-overlapping times for minimum jitter.

Okay, that's actually a good reason, and probably the part I was
missing to explain these non-log2 distance scheme leading to
heterogeneous distance (the first and last set of pages don't have
the same stride).

>
> This leads naturally to the stride-3 solution. You want to minimize the
> stride because you can read both pages in a pair with one read disturbance,
> and the shorter the distance, the more likely you'll want both pages
> (and the less buffering you'll need to make both available).
>
> Stride-3 does have those two awkward edge cases, and changing the
> stride is simply the simplest way to special-case them.

Yep.

Still, I've seen weird things while working on modern MLC NANDs which
makes me think the pairing scheme is also here to help mitigate the
write-disturb effect, but I might be wrong. The behavior I'm
describing here has been observed on Hynix (H27QCG8T2E5RâBCF) and
Toshiba (TC58TEG5DCLTA00) NANDs so far. When I write the 2 pages in a
pair, but not the following page, I see a high number of bitflips in
the last programmed page until the next page is programmed.

Let's take a real example. My NAND is exposing a stride-3 pairing
scheme, when I only program page 0, 1, 2, page 2 is showing a high
number of bitflips until page 3 is programmed. Actually, I don't
remember if the number decrease after programming page 3 or 4, but my
guess is that the NAND is accounting for future write-disturb when
programming a page in group 1, which makes this page un-reliable until
the subsequent page(s) have been programmed.

What's your opinion on that?

>
> > Thanks for your valuable review/suggestions.
> >
> > Just out of curiosity, why are you interested in the pairing scheme
> > concept? Are you working with NANDs?
>
> Not at present, but I do embedded hardware and might some day.

Okay. You seem pretty well aware of MLC/TLC NAND constraints, and you
already have good idea of how things work.
Good to have someone like you reviewing this stuff.

>
> Also, the data sheets are a real PITA to find. I have yet to
> see an actual data sheet that documents the stride-3 pairing scheme.

Yes, that's a real problem. Here is a Samsung NAND data sheet
describing stride-3 [1], and an Hynix one describing stride-6 [2].

[1]http://dl.btc.pl/kamami_wa/k9gbg08u0a_ds.pdf
[2]http://www.szyuda88.com/uploadfile/cfile/201061714220663.pdf

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com