[BUG] re-modprobe a nand controller driver module will cause system crash.

From: Bryan Wu
Date: Thu Oct 16 2008 - 06:57:05 EST


Hi folks,

These days I found a subtle bug which should be related with mtdcore layers.
The detailed story is located at
https://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=4463.

Briefly speaking,
1) modprobe a nand controller driver to add_mtd_paritition().
2) add_mtd_partition->add_devices->blktrans_notify_add->mtdblock_add_mtd->add_mtd_blktrans_dev
3) in add_mtd_blktrans_dev, alloc_disk will be called to create a new
gendisk structure according to the partition setting.
4) "gd->queue = tr->blkcore_priv->rq;"
No matter how many partitions (in my test, 2 partitions), there
will be the same number gendisk structures but just 1 queue.
They all use the same request_queue which is created in
register_mtd_blktrans.
5) mtdblockd kthread handles this request_queue for mtdblock layer.
6) There is one backing_dev_info structure member (not pointer) in
request_queue. so for several mtd partitions (serveral gendisks) there
is only one bdi structure instance.
7) So the problem is in add_disk(),
bdi_register_dev(bdi, MKDEV(disk->major, disk->first_minor));
For 1st partition mtdblock0, it will create /sys/class/bdi/31:0
and register information in bdi structure instance.
Then for 2nd partition mtdblock1, because the bdi structure
instance is the same as the 1st partition, it will overwrite bdi
structure and create /sys/class/bdi/31:1.
So the bdi info of 1st partition are totally lost.
8) When we rmmod the nand controller driver, del_mtd_partition will
only remove /sys/class/bdi/31:1 but left 1st partition
/sys/class/bdi/31:0 there.
9) modprobe again will let the bug show up.

I found this bug does not relate with my nand flash controller driver
and it should be fixed in mtdblock layer.
And if we just add only one partition, there is no such bug at all. I
tried to solve this bug, but it related with
mtdblock/mtd_blktrans/block/bdi. It is diffcult for me to find a way
to satisfy all the parts with minimal changes.

IMHO, can we just simply remove the bdi_register_dev (in add_disk) and
bdi_unregister_dev (in unlink_disk)?

P.S. I also found this bug in latest 2.6.27 kernel mainline.

Thanks
-Bryan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/