RE: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

From: Naga Sureshkumar Relli
Date: Wed Dec 12 2018 - 00:27:14 EST


Hi Boris & Miquel,

An update to my comments on thread https://lkml.org/lkml/2018/11/15/656.
In this I said, will take a default error count value as 16 and during page read, will check the error count
Register value with this and if it is equal to or greater than the default count(16) then I am checking for
Erased pages.
But bit[7:0] in ECC_Error_Count_Register(0x38) will update for each error occurred.
Link: https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html and
check for NAND module, ECC_Error_Count_Register.

I mean previously I dependent on Total error count value (bit[16:8]), but we can simply check for bit[7:0]
To see the error occurred or not.
I tried with this approach and I don't see any issues with that.
I ran ubifs with this and I am able to see the bit[7:0] count is updated for erased page read and then will
Use nand_chech_erased_ecc_chunk() to see the bitflips.

If it is ok, I will update the driver and will send new patch, but do you have any other comments on v12?

Thanks,
Naga Sureshkumar Relli

> -----Original Message-----
> From: linux-mtd [mailto:linux-mtd-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Naga
> Sureshkumar Relli
> Sent: Friday, November 23, 2018 7:24 PM
> To: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>; Boris Brezillon
> <boris.brezillon@xxxxxxxxxxx>
> Cc: robh@xxxxxxxxxx; richard@xxxxxx; linux-kernel@xxxxxxxxxxxxxxx; marek.vasut@xxxxxxxxx;
> linux-mtd@xxxxxxxxxxxxxxxxxxx; nagasuresh12@xxxxxxxxx; Michal Simek
> <michals@xxxxxxxxxx>; computersforpeace@xxxxxxxxx; dwmw2@xxxxxxxxxxxxx
> Subject: RE: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan
> NAND Flash Controller
>
> Hi Boris & Miquel,
>
> > -----Original Message-----
> > From: Miquel Raynal [mailto:miquel.raynal@xxxxxxxxxxx]
> > Sent: Tuesday, November 20, 2018 6:06 PM
> > To: Boris Brezillon <boris.brezillon@xxxxxxxxxxx>
> > Cc: Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>; richard@xxxxxx;
> > dwmw2@xxxxxxxxxxxxx; computersforpeace@xxxxxxxxx; marek.vasut@xxxxxxxxx; linux-
> > mtd@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; nagasuresh12@xxxxxxxxx;
> > robh@xxxxxxxxxx; Michal Simek <michals@xxxxxxxxxx>
> > Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan
> > NAND Flash Controller
> >
> > Hi Naga,
> >
> > Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote on Tue, 20 Nov 2018
> > 12:02:44 +0100:
> >
> > > On Tue, 20 Nov 2018 07:02:08 +0000
> > > Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> wrote:
> > >
> > >
> > > > >
> > > > > Can you please run nandbiterrs (availaible in mtd-utils). I fear your
> > > > > device won't pass the test.
> > > > Yes, nandbiterror test is passing till 24bit, after that it is failing.
> > >
> > > Can you paste the output of nandbiterrs please?
> >
> > Apparently 'nandbiterrs -i 'just crashes the kernel because of a segmentation fault. Please
> run
> > this test (from the mtd-utils package) and fix this issue. Then we would like to see the
> output.
> Here is the output of mtd_nandbiterrs,
> [ 1830.546807] mtd_nandbiterrs: verify_page
> [ 1830.551924] mtd_nandbiterrs: Successfully corrected 8 bit errors per subpage
> [ 1830.558961] mtd_nandbiterrs: Inserted biterror @ 2/5
> [ 1830.563917] mtd_nandbiterrs: rewrite page
> [ 1830.568216] mtd_nandbiterrs: read_page
> [ 1830.572155] mtd_nandbiterrs: verify_page
> [ 1830.576531] mtd_nandbiterrs: Successfully corrected 9 bit errors per subpage
> [ 1830.583568] mtd_nandbiterrs: Inserted biterror @ 2/2
> [ 1830.588527] mtd_nandbiterrs: rewrite page
> [ 1830.592881] mtd_nandbiterrs: read_page
> [ 1830.596825] mtd_nandbiterrs: verify_page
> [ 1830.601197] mtd_nandbiterrs: Successfully corrected 10 bit errors per subpage
> [ 1830.608326] mtd_nandbiterrs: Inserted biterror @ 2/0
> [ 1830.613279] mtd_nandbiterrs: rewrite page
> [ 1830.617585] mtd_nandbiterrs: read_page
> [ 1830.621524] mtd_nandbiterrs: verify_page
> [ 1830.625900] mtd_nandbiterrs: Successfully corrected 11 bit errors per subpage
> [ 1830.633027] mtd_nandbiterrs: Inserted biterror @ 3/7
> [ 1830.637984] mtd_nandbiterrs: rewrite page
> [ 1830.642281] mtd_nandbiterrs: read_page
> [ 1830.646223] mtd_nandbiterrs: verify_page
> [ 1830.650595] mtd_nandbiterrs: Successfully corrected 12 bit errors per subpage
> [ 1830.657724] mtd_nandbiterrs: Inserted biterror @ 3/6
> [ 1830.662677] mtd_nandbiterrs: rewrite page
> [ 1830.666983] mtd_nandbiterrs: read_page
> [ 1830.670922] mtd_nandbiterrs: verify_page
> [ 1830.675296] mtd_nandbiterrs: Successfully corrected 13 bit errors per subpage
> [ 1830.682417] mtd_nandbiterrs: Inserted biterror @ 3/5
> [ 1830.687373] mtd_nandbiterrs: rewrite page
> [ 1830.691671] mtd_nandbiterrs: read_page
> [ 1830.695610] mtd_nandbiterrs: verify_page
> [ 1830.699983] mtd_nandbiterrs: Successfully corrected 14 bit errors per subpage
> [ 1830.707113] mtd_nandbiterrs: Inserted biterror @ 3/2
> [ 1830.712067] mtd_nandbiterrs: rewrite page
> [ 1830.716494] mtd_nandbiterrs: read_page
> [ 1830.720459] mtd_nandbiterrs: verify_page
> [ 1830.724841] mtd_nandbiterrs: Successfully corrected 15 bit errors per subpage
> [ 1830.731963] mtd_nandbiterrs: Inserted biterror @ 3/0
> [ 1830.736920] mtd_nandbiterrs: rewrite page
> [ 1830.741161] mtd_nandbiterrs: read_page
> [ 1830.745107] mtd_nandbiterrs: verify_page
> [ 1830.749478] mtd_nandbiterrs: Successfully corrected 16 bit errors per subpage
> [ 1830.756607] mtd_nandbiterrs: Inserted biterror @ 4/2
> [ 1830.761564] mtd_nandbiterrs: rewrite page
> [ 1830.765924] mtd_nandbiterrs: read_page
> [ 1830.769858] mtd_nandbiterrs: verify_page
> [ 1830.774232] mtd_nandbiterrs: Successfully corrected 17 bit errors per subpage
> [ 1830.781362] mtd_nandbiterrs: Inserted biterror @ 4/0
> [ 1830.786318] mtd_nandbiterrs: rewrite page
> [ 1830.790558] mtd_nandbiterrs: read_page
> [ 1830.794496] mtd_nandbiterrs: verify_page
> [ 1830.798867] mtd_nandbiterrs: Successfully corrected 18 bit errors per subpage
> [ 1830.805997] mtd_nandbiterrs: Inserted biterror @ 5/7
> [ 1830.810949] mtd_nandbiterrs: rewrite page
> [ 1830.815249] mtd_nandbiterrs: read_page
> [ 1830.819189] mtd_nandbiterrs: verify_page
> [ 1830.823561] mtd_nandbiterrs: Successfully corrected 19 bit errors per subpage
> [ 1830.830690] mtd_nandbiterrs: Inserted biterror @ 5/2
> [ 1830.835646] mtd_nandbiterrs: rewrite page
> [ 1830.839943] mtd_nandbiterrs: read_page
> [ 1830.843886] mtd_nandbiterrs: verify_page
> [ 1830.848252] mtd_nandbiterrs: Successfully corrected 20 bit errors per subpage
> [ 1830.855378] mtd_nandbiterrs: Inserted biterror @ 5/0
> [ 1830.860331] mtd_nandbiterrs: rewrite page
> [ 1830.864580] mtd_nandbiterrs: read_page
> [ 1830.868522] mtd_nandbiterrs: verify_page
> [ 1830.872890] mtd_nandbiterrs: Successfully corrected 21 bit errors per subpage
> [ 1830.880023] mtd_nandbiterrs: Inserted biterror @ 6/6
> [ 1830.884975] mtd_nandbiterrs: rewrite page
> [ 1830.889224] mtd_nandbiterrs: read_page
> [ 1830.893158] mtd_nandbiterrs: verify_page
> [ 1830.897536] mtd_nandbiterrs: Successfully corrected 22 bit errors per subpage
> [ 1830.904663] mtd_nandbiterrs: Inserted biterror @ 6/2
> [ 1830.909619] mtd_nandbiterrs: rewrite page
> [ 1830.913950] mtd_nandbiterrs: read_page
> [ 1830.917893] mtd_nandbiterrs: verify_page
> [ 1830.922261] mtd_nandbiterrs: Successfully corrected 23 bit errors per subpage
> [ 1830.929384] mtd_nandbiterrs: Inserted biterror @ 6/0
> [ 1830.934340] mtd_nandbiterrs: rewrite page
> [ 1830.938579] mtd_nandbiterrs: read_page
> [ 1830.942519] mtd_nandbiterrs: verify_page
> [ 1830.946884] mtd_nandbiterrs: Successfully corrected 24 bit errors per subpage
> [ 1830.954010] mtd_nandbiterrs: Inserted biterror @ 7/7
> [ 1830.958963] mtd_nandbiterrs: rewrite page
> [ 1830.963264] mtd_nandbiterrs: read_page
> [ 1830.967143] mtd_nandbiterrs: verify_page
> [ 1830.971061] mtd_nandbiterrs: Error: page offset 0, expected 25, got 00
> [ 1830.977584] mtd_nandbiterrs: Error: page offset 1, expected a5, got 00
> [ 1830.984103] mtd_nandbiterrs: Error: page offset 2, expected 65, got 00
> [ 1830.990621] mtd_nandbiterrs: Error: page offset 3, expected e5, got 00
> [ 1830.997141] mtd_nandbiterrs: Error: page offset 4, expected 05, got 00
> [ 1831.003659] mtd_nandbiterrs: Error: page offset 5, expected 85, got 00
> [ 1831.010179] mtd_nandbiterrs: Error: page offset 6, expected 45, got 00
> [ 1831.016695] mtd_nandbiterrs: Error: page offset 7, expected c5, got 45
> [ 1831.023665] mtd_nandbiterrs: ECC failure, read data is incorrect despite read success
> modprobe: can't load module mtd_nandbiterrs
> (kernel/drivers/mtd/tests/mtd_nandbiterrs.ko): Input/output error
> ---> Test fail, unable to start nand_mtd_nandbiterrs client on the target
> I ran this on v12 series, but it didn't work straight away. I changed the code to make it work
> for this test.
> I found one problem that, the driver will work always if the page programming sequence 0x80
> followed by 0x10.
> i.e.
> [1]:nand_prog_page_op(chip, page, 0, buf, mtd->writesize)-> this op sequence is working
> with this driver.
> [2]: nand_prog_page_begin_op(chip, page, 0, NULL, 0) -> this op sequence is not working
> with this driver.
> The Arasan NAND controller is expecting 0x80 as first opcode and 0x10 as second opcode in
> the command register (off: 0xFF10000C).
> Hence in v11 series, I have added a check such that if the command is 0x080, then hardcode
> the second command as 0x10.
> But as per the Boris comments, I removed that hardcoding and it is working only if the write
> sequence is [1] as mentioned above.
>
> >
> > >
> > > > >
> > > > > > But we are hitting this because of erased page reading(needed in case of ubifs).
> > > > > >
> > > > > > >
> > > > > > > Don't you have a bit (or several bits) reporting when the ECC engine was not
> > able to
> > > > > correct
> > > > > > > data? I you do, you should base the "detect bitflips in erase pages" logic on this
> > information.
> > > > > > Bit reporting for several bit errors is there only for Hamming(1bit correction and
> > 2bit
> > > > > detection) but not in BCH.
> > > > > >
> > > > >
> > > > > Then I tend to agree with Miquel: your ECC engine is broken, and I'm
> > > > > not even sure how to deal with that yet.
> > > > So as per the Miquel's suggestion, can I proceed to add the below one?
> > > > "you should re-read the page in raw mode and check for the number of bitflips manually
> > (thanks to the helpers in the core). Again, if the number of BF is above 16, we can assume
> the
> > page is bad and increment ->ecc.failed accordingly."
> > >
> > > But that's just partially fixing the problem. And you didn't answer my
> > > previous question: what happens when you configure the ECC engine in,
> > > say 12bit/1024 and you end up with uncorrectable errors (more than 12
> > > bitflips in a 1k block). What's the number reported ECC_ERR_CNT? Is it
> > > set to 13?
> >
> > Please dump this register, and eventually what's the value of the Packet_bound_Err_count
> > field ([0:7]) for each iteration of nandbiterrs -i.
> > If there is no way, when the status bit is set, to discriminate if the data is reliable or was not
> > corrected at all, it is gonna be a real issue and I don't think we want to support such engine.
> On each iteration the error count value that I got during this test, is equal to the number of
> error bits introduced
> i.e. for 1-bit error, the error count is 1
> .......
> 24-bit errors, the error count is 24
> But after that the error count is 0.
>
> Thanks,
> Naga Sureshkumar Relli
> >
> >
> > Thanks,
> > MiquÃl
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/