Re: more NCR53c8xx ext2 problems.

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 22 Aug 1996 09:40:31 +0300 (EET DST)


On Wed, 21 Aug 1996, Todd J Derr wrote:
>
> > The previous report I'd seen of this problem was a bad directory entry at
> > offset 5632 = 5x1024+512, name_len = 1541=1024+512+5, rec_len=16.
>
> the messages I get vary, they look like some sort of multiple bit
> errors (it is hard to say since I don't know what the filename in
> question is, but I do know that the filenames used are typically ~20-30
> bytes long, valid range is probably 12-263 bytes though (the files
> are called 'Dsome.internet.hostname.00PID').

Looks like a cabling or termination problem. Especially things like two-but
errors won't even show up in a simple parity check, so if you have a long (or
low quality) cable or bad termination and get occasional bit rot, this is
what you'd see. (And you said it looks like the problems went away or at
least were less severe after disabling fast mode - that would match).

Danny ter Haar reported exactly the same kinds of problems, they were fixed
by new cables. It's at least worth checking out (even good quality cabling
shouldn't be more than $10 or something). The reason the problems show up for
directory accesses is probably because (a) that's one of the few places where
the kenrel can do data sanity checking and (b) there's a lot of IO going on
to them especially if you're doign things like news serving.

Linus