Re: FS corruption still in 2.2.13

Dag Bakke (dagb@oslo.sgi.com)
Fri, 22 Oct 1999 14:02:15 +0200


Tommy,

I looked up your previous message and found that at least question #2
doesn't apply to you. I'll include the question anyway, in case others
have similar problems. I don't have a lot to offer after you have
answered the questions below, but maybe others have?

1. What kind of harddrive have you got?
2. Is that drive a well known "good" model? (I.e. drive reports UDMA
capable and it actually works...) At least 2.3.20 and later maintains a
list of "broken" drives.
3. Does the drive manufacturer have a more recent firmware for that
particular model?
4. Do all drives (in a striped set) have the same firmware?
5. Is your drive a known good instance of that particular model? (Does
the fs-corruption always happen on that particular drive and not on
other drives?)
6. Can you try disabling write cache on your drive? (I assume there is a
hdparm option for this...)
7. Have you checked and verified SCSI termination?
8. Does the driver for your particular brand/model of controller have a
debug option?
9. Can you disable CTQ?
10. Can you run your drives at lower bus bandwidth. (I.e Fast scsi2 in
stead of ultra, plain dma or pio mode [1-4] instead of udma.)
11. Is there a more recent firmware for your scsi-controller? (Yes,
there are buggy scsi-firmwares out there.)
12. Have you tried remaking your filesystem? And stripe set?
13. Are there several partitions on the drive? Can you verify they don't
overlap in any way?
14. What version of mkfs/fsck/raidtools are you using?

I'd also like to point out three other things which can help people find
out if their problems with the linux-kernel are hardware related:

a.
http://reality.sgi.com/cbrady_denver/memtest86/
"Memtest86 is thorough, stand alone memory test for x86 architecture
computers. BIOS based
memory tests are only a quick check and often miss many of the failures
that are detected by Memtest86. "
Of course, this applies to x86 type systems only...

b.
http://www.BitWizard.nl/sig11/
Flaky hardware can give you almost *any* error. Please visit the website
and read the the answer to the question "What are other possibilities? "

c.
Disabling various options in the BIOS or choosing less agressive options
is also a useful way of debugging (possible) hardware errors.

-- 
Dag Bakke

Tommy van Leeuwen uttered: > Hi, > > This week i posted a report of fs corruption in 2.2.12. People asked me > if i also got the error with the 2.2.13pre patches. I switched to > 2.2.13 yesterday and the fs corruption is reoccuring. > > This is what i get now: > > Oct 21 23:13:04 news2 kernel: EXT2-fs error (device md(9,3)): > ext2_add_entry: bad entry in directory #10022921: directory entry across > blocks - offset=7152, inode=843004500, rec_len=17248, name_len=45 > Oct 21 23:13:04 news2 kernel: EXT2-fs error (device md(9,3)): > ext2_add_entry: bad entry in directory #10022921: directory entry across > blocks - offset=7152, inode=843004500, rec_len=17248, name_len=45 > > If anyone wants more details, please let me know. > > Regards, > > Tommy

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/