RE: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

From: Naga Sureshkumar Relli
Date: Tue Nov 20 2018 - 02:04:12 EST


Hi Boris,

> -----Original Message-----
> From: Boris Brezillon [mailto:boris.brezillon@xxxxxxxxxxx]
> Sent: Monday, November 19, 2018 1:33 PM
> To: Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>
> Cc: miquel.raynal@xxxxxxxxxxx; richard@xxxxxx; dwmw2@xxxxxxxxxxxxx;
> computersforpeace@xxxxxxxxx; marek.vasut@xxxxxxxxx; linux-mtd@xxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; nagasuresh12@xxxxxxxxx; robh@xxxxxxxxxx; Michal Simek
> <michals@xxxxxxxxxx>
> Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND
> Flash Controller
>
> On Mon, 19 Nov 2018 06:20:28 +0000
> Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> wrote:
>
> > H Boris,
> >
> > > -----Original Message-----
> > > From: Boris Brezillon [mailto:boris.brezillon@xxxxxxxxxxx]
> > > Sent: Monday, November 19, 2018 1:13 AM
> > > To: Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>
> > > Cc: miquel.raynal@xxxxxxxxxxx; richard@xxxxxx; dwmw2@xxxxxxxxxxxxx;
> > > computersforpeace@xxxxxxxxx; marek.vasut@xxxxxxxxx;
> > > linux-mtd@xxxxxxxxxxxxxxxxxxx; linux- kernel@xxxxxxxxxxxxxxx;
> > > nagasuresh12@xxxxxxxxx; robh@xxxxxxxxxx; Michal Simek
> > > <michals@xxxxxxxxxx>
> > > Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support
> > > for Arasan NAND Flash Controller
> > >
> > > On Thu, 15 Nov 2018 09:34:16 +0000
> > > Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> wrote:
> > >
> > > > Hi Boris & Miquel,
> > > >
> > > > I am updating the driver by addressing your comments, and I have
> > > > one concern, especially in anfc_read_page_hwecc(), there I am checking for erased pages
> bit flips.
> > > > Since Arasan NAND controller doesn't have multibit error detection
> > > > beyond 24-bit( it can correct up to 24 bit), i.e. there is no
> > > > indication from controller to detect
> > > uncorrectable error beyond 24bit.
> > >
> > > Do you mean that you can't detect uncorrectable errors, or just that
> > > it's not 100% sure to detect errors above max_strength?
> > Yes, in Arasan NAND controller there is no way to detect uncorrectable errors beyond 24-
> bit.
>
> So how do you detect uncorrectable errors when the strength is less than
> 24bits?
Below or equal to the level of ECC strength, controller will definitely correct.
But beyond the level of ECC strength, it won't even detect the errors.
>
> > >
> > > > So I took some error count as default value(MULTI_BIT_ERR_CNT 16, I
> > > > put this based on the error count that I got while reading erased page on Micron device).
> > > > And during a page read, will just read the error count register and
> > > > compare this value with the default error count(16) and if it is more Than default then I
> am
> > > checking for erased page bit flips.
> > >
> > > Hm, that's wrong, especially if you set ecc_strength to something > 16.
> > Ok
> > >
> > > > I am doubting that this will not work in all cases.
> > >
> > > It definitely doesn't.
> > Ok
> > >
> > > > In my case it is just working because the error count that it got on an erased page is 16.
> > > > Could you please suggest a way to do detect erased_page bit flips when reading a page
> with
> > > HW-ECC?.
> > >
> > > I'm a bit lost. Is the problem only about bitflips in erase pages, or is it also impacting reads
> of
> > > written pages that lead to uncorrectable errors.
> > Yes, it is for both. But in case of read errors that we can't detect beyond 24-bit, then the
> answer from HW design team
> > Is that the flash part is bad.
> > Unfortunately till now we haven't ran into that situation(read errors of written pages beyond
> 24-bit).
>
> Can you please run nandbiterrs (availaible in mtd-utils). I fear your
> device won't pass the test.
Yes, nandbiterror test is passing till 24bit, after that it is failing.
>
> > But we are hitting this because of erased page reading(needed in case of ubifs).
> >
> > >
> > > Don't you have a bit (or several bits) reporting when the ECC engine was not able to
> correct
> > > data? I you do, you should base the "detect bitflips in erase pages" logic on this information.
> > Bit reporting for several bit errors is there only for Hamming(1bit correction and 2bit
> detection) but not in BCH.
> >
>
> Then I tend to agree with Miquel: your ECC engine is broken, and I'm
> not even sure how to deal with that yet.
So as per the Miquel's suggestion, can I proceed to add the below one?
"you should re-read the page in raw mode and check for the number of bitflips manually (thanks to the helpers in the core). Again, if the number of BF is above 16, we can assume the page is bad and increment ->ecc.failed accordingly."

Thanks,
Naga Sureshkumar Relli