Re: [BUG] Nand support broken with v2.6.36-rc1

From: Brian Norris
Date: Tue Aug 17 2010 - 13:01:29 EST


Hello,

On 08/17/2010 01:52 AM, Michael Guntsche wrote:
> The only thing that might be special with the nand driver that is being
> used is that a different oob layout is being used.
>
> static struct nand_ecclayout rbppc_nand_oob_16 = {
> .eccbytes = 6,
> .eccpos = { 8, 9, 10, 13, 14, 15 },
> .oobavail = 9,
> .oobfree = { { 0, 4 }, { 6, 2 }, { 11, 2 }, { 4, 1 } }
> };

On 08/17/2010 04:36 AM, Michael Guntsche wrote:
I added this to the nand driver itself.

static uint8_t scan_ff_pattern[] = { 0xff, 0xff };
static struct nand_bbt_descr rbppc_nand_smallpage = {
.options = NAND_BBT_SCAN2NDPAGE,
.offs = NAND_SMALL_BADBLOCK_POS,
.len = 1,
.pattern = scan_ff_pattern
};

and the driver is working again. But shouldn't this be supported by the stock level code as well?

Why yes, it should! Somebody (probably me) goofed. Your nand_ecclayout is conflicting with the kernel's choice of bad block position. Recent changes must have affected which position is chosen automatically by the kernel.

One of the following two cases is likely the problem:
(1) Your chip is supposed to use offset 0, not 5, for the BBM (i.e., NAND_LARGE_BADBLOCK_POS, not NAND_SMALL_BADBLOCK_POS), and so your ecclayout should not be leaving byte 0 in the "oobfree" array (a design flaw since you first began using this chip)
(2) I made the commit that you mentioned (c7b28e25cb9beb943aead770ff14551b55fa8c79) too restrictive in allowing chips to use NAND_SMALL_BADBLOCK_POS.

Option 2 is likely the case, and in fact, I realized a stupid mistake I made in refactoring the detection here.

I have been studying data from hundreds of flash chips to find where the factory-determined markers should be stored. Unfortunately, I can't cover all of them, and so your Hynix chip is likely one that was overlooked. Could you send the full NAND ID string (8 bytes, not just the manufacturer and chip ID), an exact part number for the flash, and a datasheet? Any one of those could help (the datasheet being the most important), but whatever you can provide is helpful. More data on your chip would allow me to determine the problem for sure; I will send a patch ASAP once I get your information.

Sorry for the trouble!

On another note, it may be intelligent to have the kernel-specific systems check for such a conflict between bad-block markers and ECC layout. If a position needed by the bad-block marker is listed in "oobfree" or "eccpos" then we have a problem. Sound like a good idea anybody? If so, what would be the best approach:
* print an error and quit detection
* try to modify the ecclayout, bbm info or both
* try to modify, and fall-back to error message and quit if necessary

Thanks,
Brian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/