Re: Massive e2fs corruption with 2.2.9/10?

Ville Herva (vherva@niksula.hut.fi)
Fri, 18 Jun 1999 23:02:15 +0300


Harald Koenig <koenig@tat.physik.uni-tuebingen.de> wrote:
> two times I got strange ext2fs errors -- there was a duplicate block
> both times. for the 2nd error I kept the fsck log:
>
> Duplicate blocks found... invoking duplicate block passes.
> Pass 1B: Rescan for duplicate/bad blocks
> Duplicate/bad block(s) in inode 196693: 787271
> Duplicate/bad block(s) in inode 196854: 787271
> Duplicate/bad block(s) in inode 418198: 787271
> Pass 1C: Scan directories for inodes with dup blocks.
> Pass 1D: Reconciling duplicate blocks
> (There are 3 inodes containing duplicate/bad blocks.)

I had the same here. I had just finished compiling linux-2.2.10 on 2.2.9
and when I booted another linux in the network (to update to 2.2.10 as
well), the machine locked solid. (The one that had finished compiling, not
the booted one.) No pinging, no ctrl-alt-del, no alt-sysrq. After I
pressed reset, fsck said something was wrong and I had to go to
maintenance mode and manually run fsck. It gave a large pile of
"Duplicate/bad block(s)"'s.

Anyway, I rebooted to the newly compiled 2.2.10, and saw that many of the
files that I had created during the last 24h were missing. Trying to
figure out what had happened, I ran cat /dev/hdc > /dev/null and
cat /dev/hdc[12345] > /dev/null. That gave an I/O error at about 186MB
(the disk is 6.2BG), but that resides on an FAT partition. The partition
is only half full and I could read all the files from there. Any other
partition (the now corrupted linux root, /dev/hdc5 included) gave no
errors.

At this point, I was fairly sure the disk was failing, altough the lock up
seemed a bit odd. I backed up my data and continued using the disk.
Now, after one day uptime the 2.2.10 locked up as well. The symptoms were
the same although the machine had been idle for several hours at the time.
(One odd thing was that although I was unable to ctrl-alt-del, alt-sysrq,
ctrl-alt-f1 etc, pressing a key woke up my monitor that was in power
saving state.) After a reboot, I got the same file system corruption
symptoms, manual fsck gave the same "Duplicate/bad block(s)"'s, etc.

Before the first lock up, the machine had been up for 31 days, so I
wouldn't call it unstable. Also, with 2.0.24 I never had lock ups
(~100 day uptimes etc.)

The hw I have: Seagate IDE 6.2 GB ST36451A, no UDMA in use, P166.

All the kernels (the ever so stable 2.0.34 included) have been compiled
with pgcc-1.0.2, which, of course, makes one suspiciuous. On the other
hand, before 2.2.{9,10}, I've had absolutely no problems.

This may still be a disk fault, but the symptoms sound quite alike the
fs corruption other cases...

One strange thing was that when I tried to "du -k" directories on an ntfs
partition, when backing up after the lockup, it gave absolutely false
results. Large directories were reported as containing only few KB's.
Afterwards, when I cd'd into the dir, ls showed nothing either. (The files
would show perfectly before du.) Finally ls showed nothing on the root of
the ntfs partition either. After unmount and remount, I saw the files
again, were able to ls -R them and even scp -R them all.

-- v --

v@iki.fi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/