Re: Reading a bad sector does not report failure as 'read error' buthangs PC with 'Machine Check Exception'

From: Robert Hancock
Date: Sun Jul 29 2007 - 11:34:06 EST


Hendrik . wrote:
Last night I discovered a problem in my RAID5 array
and finally after a lot of tests I narrowed it down to
a bad sector on one of the hard disks and some goofy
kernels.

I just yesterday build a new PC using an existing
array of 5 disks in RAID 5. I did build the array with
only 4 out of 5 disks in the system but the rebuild
processes stopped over and over again apparently at
the same position. At last I found out that the
harddisk at the first SATA port had developed some bad
sectors which made the kernel stop completely when it
tried to read that sector with the following error on
the screen:

HARDWARE ERROR
CPU 0: Machine Check Exception: 4 Bank 4:
b200000000070f0f
TSC b7d4a144d0
This is not a software problem!
Run through mcelog --ascii to decode and contact your
hardware vendor
Kernel panic - not syncing: Machine check

You should run this through mcelog as it suggests and see what it shows. The kernel should be handling this properly, unless the drive problem is causing the controller to do something bad. Note that kernels 2.6.20 and later use ADMA mode on the nForce4 SATA controller whereas previous versions used it essentially like a PATA controller, so it is not surprising that the behavior is different.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@xxxxxxxxxxxxx
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/