Mylex AcceleRAID 250 dropped dead [2.2.14]

From: Christian Robottom Reis (kiko@radiumsystems.com.br)
Date: Fri Apr 07 2000 - 10:27:39 EST


I'm running the Mylex driver 2.2.5 on a two-disk RAID1 production box
(racked here on an ISP). I've changed the firmware to 4.07-07 listed on
the page for the driver, and stress-tested it for a couple of days before
shipping.

The setup was a bit flaky to start off with, but the firmware switch and
the driver upgrade seemed to solve the stability problems, and I've run it
up to 100% for hours per day on benchmarks, kernel compiles, and
apachebenches.

The RAID has just gone dead this morning. The system had been idle for a
while, though it was rebooted twice about an hour before syslogd died and
nothing worked anymore.

One of the drives shows up as Dead on bootup - the other as online, but
with FS corruption galore. Trying to rebuild redundancy data or bring the
Dead drive online using the firmware configuration tool just plain failed
and I'm being strongly led to believe that this is just a HW failure.
However, someone might just have had the same problem and could point it
out. I can boot up with the online drive (though it's killed a couple of
binaries, I do have a full tape backup from last night).

Reformatting the Dead drive responds with a "Notice - Background Task
(Controller 0, command 04) - command failed.

I'm trying to reflash the controller roms ATM and wondering what went
wrong. I've been sitting here for a couple of hours with no sleep. :-) If
this works, I'm starting to believe the firmware has been corrupted. Is
this head of at all?

The box is a dual P3-550 with 1G memory on an Intel i440NX board. We're
running 2.2.14 (patched for nfsv3 and including the 2.2.5 mylex driver).
This is a simple webserver running apache, slapd and sshd. Nothing else.

Does this smell the most of hardware or software? Can the nfsv3 patches be
so evil? Is driver 2.2.5 silently corrupting board firmware?

Any help is dearly appreciated. This should have been over a week ago, but
I'm still here!

Cheers,

--
_/\ Christian Reis is sometimes kiko@radiumsystems.com.br 
\/~ suicide architect | free software advocate | mountain biker 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:18 EST