RE: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

From: Naga Sureshkumar Relli
Date: Fri Nov 23 2018 - 08:53:47 EST


Hi Boris & Miquel,

> -----Original Message-----
> From: Miquel Raynal [mailto:miquel.raynal@xxxxxxxxxxx]
> Sent: Tuesday, November 20, 2018 6:06 PM
> To: Boris Brezillon <boris.brezillon@xxxxxxxxxxx>
> Cc: Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>; richard@xxxxxx;
> dwmw2@xxxxxxxxxxxxx; computersforpeace@xxxxxxxxx; marek.vasut@xxxxxxxxx; linux-
> mtd@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; nagasuresh12@xxxxxxxxx;
> robh@xxxxxxxxxx; Michal Simek <michals@xxxxxxxxxx>
> Subject: Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan
> NAND Flash Controller
>
> Hi Naga,
>
> Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote on Tue, 20 Nov 2018
> 12:02:44 +0100:
>
> > On Tue, 20 Nov 2018 07:02:08 +0000
> > Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> wrote:
> >
> >
> > > >
> > > > Can you please run nandbiterrs (availaible in mtd-utils). I fear your
> > > > device won't pass the test.
> > > Yes, nandbiterror test is passing till 24bit, after that it is failing.
> >
> > Can you paste the output of nandbiterrs please?
>
> Apparently 'nandbiterrs -i 'just crashes the kernel because of a segmentation fault. Please run
> this test (from the mtd-utils package) and fix this issue. Then we would like to see the output.
Here is the output of mtd_nandbiterrs,
[ 1830.546807] mtd_nandbiterrs: verify_page
[ 1830.551924] mtd_nandbiterrs: Successfully corrected 8 bit errors per subpage
[ 1830.558961] mtd_nandbiterrs: Inserted biterror @ 2/5
[ 1830.563917] mtd_nandbiterrs: rewrite page
[ 1830.568216] mtd_nandbiterrs: read_page
[ 1830.572155] mtd_nandbiterrs: verify_page
[ 1830.576531] mtd_nandbiterrs: Successfully corrected 9 bit errors per subpage
[ 1830.583568] mtd_nandbiterrs: Inserted biterror @ 2/2
[ 1830.588527] mtd_nandbiterrs: rewrite page
[ 1830.592881] mtd_nandbiterrs: read_page
[ 1830.596825] mtd_nandbiterrs: verify_page
[ 1830.601197] mtd_nandbiterrs: Successfully corrected 10 bit errors per subpage
[ 1830.608326] mtd_nandbiterrs: Inserted biterror @ 2/0
[ 1830.613279] mtd_nandbiterrs: rewrite page
[ 1830.617585] mtd_nandbiterrs: read_page
[ 1830.621524] mtd_nandbiterrs: verify_page
[ 1830.625900] mtd_nandbiterrs: Successfully corrected 11 bit errors per subpage
[ 1830.633027] mtd_nandbiterrs: Inserted biterror @ 3/7
[ 1830.637984] mtd_nandbiterrs: rewrite page
[ 1830.642281] mtd_nandbiterrs: read_page
[ 1830.646223] mtd_nandbiterrs: verify_page
[ 1830.650595] mtd_nandbiterrs: Successfully corrected 12 bit errors per subpage
[ 1830.657724] mtd_nandbiterrs: Inserted biterror @ 3/6
[ 1830.662677] mtd_nandbiterrs: rewrite page
[ 1830.666983] mtd_nandbiterrs: read_page
[ 1830.670922] mtd_nandbiterrs: verify_page
[ 1830.675296] mtd_nandbiterrs: Successfully corrected 13 bit errors per subpage
[ 1830.682417] mtd_nandbiterrs: Inserted biterror @ 3/5
[ 1830.687373] mtd_nandbiterrs: rewrite page
[ 1830.691671] mtd_nandbiterrs: read_page
[ 1830.695610] mtd_nandbiterrs: verify_page
[ 1830.699983] mtd_nandbiterrs: Successfully corrected 14 bit errors per subpage
[ 1830.707113] mtd_nandbiterrs: Inserted biterror @ 3/2
[ 1830.712067] mtd_nandbiterrs: rewrite page
[ 1830.716494] mtd_nandbiterrs: read_page
[ 1830.720459] mtd_nandbiterrs: verify_page
[ 1830.724841] mtd_nandbiterrs: Successfully corrected 15 bit errors per subpage
[ 1830.731963] mtd_nandbiterrs: Inserted biterror @ 3/0
[ 1830.736920] mtd_nandbiterrs: rewrite page
[ 1830.741161] mtd_nandbiterrs: read_page
[ 1830.745107] mtd_nandbiterrs: verify_page
[ 1830.749478] mtd_nandbiterrs: Successfully corrected 16 bit errors per subpage
[ 1830.756607] mtd_nandbiterrs: Inserted biterror @ 4/2
[ 1830.761564] mtd_nandbiterrs: rewrite page
[ 1830.765924] mtd_nandbiterrs: read_page
[ 1830.769858] mtd_nandbiterrs: verify_page
[ 1830.774232] mtd_nandbiterrs: Successfully corrected 17 bit errors per subpage
[ 1830.781362] mtd_nandbiterrs: Inserted biterror @ 4/0
[ 1830.786318] mtd_nandbiterrs: rewrite page
[ 1830.790558] mtd_nandbiterrs: read_page
[ 1830.794496] mtd_nandbiterrs: verify_page
[ 1830.798867] mtd_nandbiterrs: Successfully corrected 18 bit errors per subpage
[ 1830.805997] mtd_nandbiterrs: Inserted biterror @ 5/7
[ 1830.810949] mtd_nandbiterrs: rewrite page
[ 1830.815249] mtd_nandbiterrs: read_page
[ 1830.819189] mtd_nandbiterrs: verify_page
[ 1830.823561] mtd_nandbiterrs: Successfully corrected 19 bit errors per subpage
[ 1830.830690] mtd_nandbiterrs: Inserted biterror @ 5/2
[ 1830.835646] mtd_nandbiterrs: rewrite page
[ 1830.839943] mtd_nandbiterrs: read_page
[ 1830.843886] mtd_nandbiterrs: verify_page
[ 1830.848252] mtd_nandbiterrs: Successfully corrected 20 bit errors per subpage
[ 1830.855378] mtd_nandbiterrs: Inserted biterror @ 5/0
[ 1830.860331] mtd_nandbiterrs: rewrite page
[ 1830.864580] mtd_nandbiterrs: read_page
[ 1830.868522] mtd_nandbiterrs: verify_page
[ 1830.872890] mtd_nandbiterrs: Successfully corrected 21 bit errors per subpage
[ 1830.880023] mtd_nandbiterrs: Inserted biterror @ 6/6
[ 1830.884975] mtd_nandbiterrs: rewrite page
[ 1830.889224] mtd_nandbiterrs: read_page
[ 1830.893158] mtd_nandbiterrs: verify_page
[ 1830.897536] mtd_nandbiterrs: Successfully corrected 22 bit errors per subpage
[ 1830.904663] mtd_nandbiterrs: Inserted biterror @ 6/2
[ 1830.909619] mtd_nandbiterrs: rewrite page
[ 1830.913950] mtd_nandbiterrs: read_page
[ 1830.917893] mtd_nandbiterrs: verify_page
[ 1830.922261] mtd_nandbiterrs: Successfully corrected 23 bit errors per subpage
[ 1830.929384] mtd_nandbiterrs: Inserted biterror @ 6/0
[ 1830.934340] mtd_nandbiterrs: rewrite page
[ 1830.938579] mtd_nandbiterrs: read_page
[ 1830.942519] mtd_nandbiterrs: verify_page
[ 1830.946884] mtd_nandbiterrs: Successfully corrected 24 bit errors per subpage
[ 1830.954010] mtd_nandbiterrs: Inserted biterror @ 7/7
[ 1830.958963] mtd_nandbiterrs: rewrite page
[ 1830.963264] mtd_nandbiterrs: read_page
[ 1830.967143] mtd_nandbiterrs: verify_page
[ 1830.971061] mtd_nandbiterrs: Error: page offset 0, expected 25, got 00
[ 1830.977584] mtd_nandbiterrs: Error: page offset 1, expected a5, got 00
[ 1830.984103] mtd_nandbiterrs: Error: page offset 2, expected 65, got 00
[ 1830.990621] mtd_nandbiterrs: Error: page offset 3, expected e5, got 00
[ 1830.997141] mtd_nandbiterrs: Error: page offset 4, expected 05, got 00
[ 1831.003659] mtd_nandbiterrs: Error: page offset 5, expected 85, got 00
[ 1831.010179] mtd_nandbiterrs: Error: page offset 6, expected 45, got 00
[ 1831.016695] mtd_nandbiterrs: Error: page offset 7, expected c5, got 45
[ 1831.023665] mtd_nandbiterrs: ECC failure, read data is incorrect despite read success
modprobe: can't load module mtd_nandbiterrs (kernel/drivers/mtd/tests/mtd_nandbiterrs.ko): Input/output error
---> Test fail, unable to start nand_mtd_nandbiterrs client on the target
I ran this on v12 series, but it didn't work straight away. I changed the code to make it work for this test.
I found one problem that, the driver will work always if the page programming sequence 0x80 followed by 0x10.
i.e.
[1]:nand_prog_page_op(chip, page, 0, buf, mtd->writesize)-> this op sequence is working with this driver.
[2]: nand_prog_page_begin_op(chip, page, 0, NULL, 0) -> this op sequence is not working with this driver.
The Arasan NAND controller is expecting 0x80 as first opcode and 0x10 as second opcode in the command register (off: 0xFF10000C).
Hence in v11 series, I have added a check such that if the command is 0x080, then hardcode the second command as 0x10.
But as per the Boris comments, I removed that hardcoding and it is working only if the write sequence is [1] as mentioned above.

>
> >
> > > >
> > > > > But we are hitting this because of erased page reading(needed in case of ubifs).
> > > > >
> > > > > >
> > > > > > Don't you have a bit (or several bits) reporting when the ECC engine was not
> able to
> > > > correct
> > > > > > data? I you do, you should base the "detect bitflips in erase pages" logic on this
> information.
> > > > > Bit reporting for several bit errors is there only for Hamming(1bit correction and
> 2bit
> > > > detection) but not in BCH.
> > > > >
> > > >
> > > > Then I tend to agree with Miquel: your ECC engine is broken, and I'm
> > > > not even sure how to deal with that yet.
> > > So as per the Miquel's suggestion, can I proceed to add the below one?
> > > "you should re-read the page in raw mode and check for the number of bitflips manually
> (thanks to the helpers in the core). Again, if the number of BF is above 16, we can assume the
> page is bad and increment ->ecc.failed accordingly."
> >
> > But that's just partially fixing the problem. And you didn't answer my
> > previous question: what happens when you configure the ECC engine in,
> > say 12bit/1024 and you end up with uncorrectable errors (more than 12
> > bitflips in a 1k block). What's the number reported ECC_ERR_CNT? Is it
> > set to 13?
>
> Please dump this register, and eventually what's the value of the Packet_bound_Err_count
> field ([0:7]) for each iteration of nandbiterrs -i.
> If there is no way, when the status bit is set, to discriminate if the data is reliable or was not
> corrected at all, it is gonna be a real issue and I don't think we want to support such engine.
On each iteration the error count value that I got during this test, is equal to the number of error bits introduced
i.e. for 1-bit error, the error count is 1
.......
24-bit errors, the error count is 24
But after that the error count is 0.

Thanks,
Naga Sureshkumar Relli
>
>
> Thanks,
> MiquÃl