Re: ext2 corruption in 2.0.33

Michael Bedy (mjbedy@mail.portup.com)
Mon, 26 Jan 1998 20:23:53 -0500


Well, I wasn't going to report this unless it happened again, but
since we are on the topic, here goes:

About two weeks ago, while running 2.0.33 (standard; i.e. no patches)
I ran into a disk corruption problem.

I was generating a lot of disk activity (Netscape running, plus the
Gimp, plus compiling something - on a 32MB maching. Lots of swapping.)
And all of a suddent it got quiet.. Too quiet...

I tried to find out what was going on, but nothing (including the
reboot command) worked. After reboot the kernal would freeze when it
tried to mount root. I eventually managed to fsck the drive from the
RedHat 'rescue' disks only to find that my lib directory was torn up. I
ended up just reinstalling RedHat, instead of tring to clean that mess
up.

The machine is about a year old, PPro-200. 2.0GB NEC drive. I use it
heavily (windows and Linix), as I am a CS student (lot's of homework).
and have NEVER had any problem with it in the past. The only interesting
and different thing that was going on was the sustained heavy disk
activity.

If you want anymore information, just let me know.

Thanks,
-- Mike

Theodore Y. Ts'o wrote:
>
> Date: Sat, 24 Jan 1998 18:06:04 +0100 (MET)
> From: Andries.Brouwer@cwi.nl
>
> Below a report on ext2 corruption I got a moment ago.
> If noone else reports such things[*] then maybe my hardware
> is not 100%, but this is a 1-month-old machine that has never
> given any cause for suspicion. For the time being I suspect
> the kernel, a vanilla 2.0.33.
>
> So, I am afraid I have no idea whether this is an isolated occurrence.
>
> There has been exactly one other such report (although unfortunately I
> don't have the kernel version number) from Nick Holloway
> (Nick.Holloway@alfie.demon.co.uk) on September 30, 1997.
>
> My guess then, and it remains the same, since if there was a problem
> with the kernel we should have seen a lot more complaints than just
> these two isolated reports in over six months, was a hardware or DMA
> glitch which caused data to get written to the wrong place.
>
> It's possible that it could be a kernel problem, though, which is why I
> do keep track and file such reports when people raise them, in order to
> find patterns if they exist. So I'll file your report and see if anyone
> else reports these problems (I track linux-kernel and
> comp.os.linux.development.system looking for such bug reports.)
>
> - Ted