Re: could someone plz explain those ext3/hard disk errors

From: JG
Date: Wed Feb 18 2004 - 10:59:11 EST


hi,

> I.E. Even though there is every chance that the drive is faulty, the
> posted error message doesn't indicate a drive failiure in itself, and
> you should look elsewhere.

i recently got the new disks and could backup nearly everything (after reboot the disks were accessible again, though i've lost some data).

i tried to zero out the disk with 'dd if=/dev/zero of=/dev/hdX' which led to a complete system lockup after some time.

after a reboot i wanted to run the long S.M.A.R.T. tests (smartctl -t long /dev/hdX, smartctl v5.26). it said that it is backgrounding for about 80 minutes. but again after some time => complete lockup.
i couldn't do anything anymore on the server, only sysrq-keys were working. killing the processes gave me some error messages (can't remember the exact wording but they were like: "DMA lost" on nearly every disk and some weird interrupt errors (related to the NIC).

$ cat /proc/interrupts
CPU0
0: 435370782 XT-PIC timer
1: 315 XT-PIC i8042
2: 0 XT-PIC cascade
5: 6144225 XT-PIC ide2, ide3, ide4, ide5
8: 2 XT-PIC rtc
10: 0 XT-PIC ohci_hcd
11: 68722839 XT-PIC eth1
12: 227629214 XT-PIC ohci_hcd, eth0
14: 4515100 XT-PIC ide0
15: 643567 XT-PIC ide1
NMI: 0
LOC: 435357136
ERR: 680356
MIS: 0

don't know if the ERR-rate is too high, this is with an uptime of 5 days. i usually have much higher ERR numbers.

JG

Attachment: pgp00000.pgp
Description: PGP signature