2.4.0-test2-ac2 with RAID lockup

From: Russell Coker (russell@coker.com.au)
Date: Sun Jul 02 2000 - 16:48:55 EST


I am using RAID-1 with two 46G IBM ATA-66 hard drives, the RAID-1 is on
/dev/hda4 and /dev/hda4 and is 43510604 blocks in size.

Here's the fdisk output from /dev/hdc, /dev/hda is exactly the same (I used
dd to copy the partition table).

Disk /dev/hdc: 255 heads, 63 sectors, 5606 cylinders
Units = cylinders of 16065 * 512 bytes
 
   Device Boot Start End Blocks Id System
/dev/hdc1 1 33 265041 82 Linux swap
/dev/hdc2 34 36 24097+ 83 Linux
/dev/hdc3 37 189 1228972+ 83 Linux
/dev/hdc4 190 5606 43512052+ 83 Linux

Here is my current /etc/raidtab:
raiddev /dev/md0
  raid-level 1
  nr-raid-disks 2
  persistent-superblock 1
  chunk-size 4k
  device /dev/ide/host0/bus0/target0/lun0/part4
  raid-disk 0
  device /dev/ide/host0/bus1/target0/lun0/part4
  raid-disk 1

Here is my df output:
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part3
                       1228916 952728 276188 78% /
/dev/ide/host0/bus0/target0/lun0/part2
                         23333 4172 17957 19% /boot
/dev/md/0 43510604 557652 42952952 1% /mnt

I am migrating to devfs as you will notice. I started playing with devfs
AFTER the first time RAID locked up on me so I know it's not caused by devfs.
I have had RAID lockup with and without devfs.

The first time it locked up I could run ps/top/etc but any attempt to sync
etc would hang (D state).
The second time it was more serious. The command:
ls -al /proc/247
would go D state, so ps, top, etc would all hang. Pid 247 was named.

At the time of the hang the only process using the RAID device was a bonnie++
test run on a ReiserFS file system. I doubt that ReiserFS or the drive
hardware was at fault because I spent an entire week running various bonnie++
tests on /dev/hda4 with ReiserFS and ext2 without any problems at all. All
the evidence I have suggests that ReiserFS is not at fault, however I have
CC'd the ReiserFS list (just in case) and I am continuing to run tests.

The bonnie++ command was:
bonnie++ -s 512

It seems that running the bonnie++ command on it's own does not cause a hang,
it's only when I redirect stdout and stderr to files on the same file system
that problems occur. I've just run another test and had it hang badly enough
that I couldn't login via telnet or ssh (the machine has no monitor).

This time it got through the first run but on the second run (with parameters
"-s 1024 -n 60") it hung.

The machine in question is a test machine. So if anyone has any experimental
patches etc they want me to run then please let me know.

Russell Coker

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:11 EST