Spurious disk change detections interferes with disk access

From: Jörg Henne
Date: Fri Apr 29 2005 - 15:34:19 EST

Hi all,

using a CompactFlash based system, I am experiencing a very weird behaviour I just don't understand.

The system is based on
- 2.4.28
- devfs
- a CompactFlash IDE drive attached to a real IDE controller (in contrast to a PC-Card adapter)
- the root filesystem resides on the CF

While booting, there are irregular but frequent errors which manifest in messages like
hda1: bad access: block=<something> count=<something>
which are followed by error messages from the filesystem and finally errors in the boot process itself.

I added some more debug output to drivers/ide/ide-io.c and found that while the block offset was well within the nominal bounds of the device, the message was actually caused by the fact that while the error occurred, the partition table was all zeroed-out, i.e. drive->part[minor&PARTN_MASK].nr_sects == 0.
The reason why the drive's partition table is zeroed-out temporarily seems to be that there is an ide_revalidate_disk() going on at that time, which in turn is explained by messages like
VFS: Disk change detected on device 03:00
/dev/ide/host0/bus0/target0/lun0: p1 p2
which precede the "bad access" message.

As I see it, the core of the problem is:
- something accesses the devfs directory associated with the drive (ls /dev/ide/host0/bus0/target0/lun0 will do)
- defvs runs a check_media_change which ends up at ide-disk.c:idedisk_media_change() which is implemented like this:
/* if removable, always assume it was changed */
return drive->removable;
- devfs decides that a revalidate is appropriate since the media hash changed, which trashes the partition table for a brief moment
- processes accessing the drive at that time see lots of errors

Is it really true, that for removable devices, a simple directory access on the devfs can interfere with accesses to the drive? Sounds like a great start for DOSing the system.
I see two possible solutions for the problem:
- A lock is put in place which guards access to the drive while the partition table is re-built. Seems fairly hacky to me.
- The media change detection mechanism is improved, so that it can detect when the media REALLY changed.

Thanks in advance for any input on this.
Joerg Henne

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/