Re: [PATCH, v5] ext3: validate directory entry data before use

From: Duane Griffin
Date: Thu Jul 03 2008 - 08:22:00 EST


2008/7/3 <Valdis.Kletnieks@xxxxxx>:
> This may or may not be related, but I've managed to hit another interesting
> piece of ext3 damage while running 26-rc8-mmotd-0701:
>
> % /bin/ls -l /usr/share/man/man5 | grep lvm
> /bin/ls: cannot access /usr/share/man/man5/lvm.conf.5.gz: Stale NFS file handle
> -????????? ? ? ? ? ? lvm.conf.5.gz
>
> Yes, that *is* on an ext3 filesystem.

That is happening because links == 0...

> debugfs on /usr/share is interesting:
>
> debugfs: stat /man/man5/lvm.conf.5.gz
> Inode: 59918 Type: regular Mode: 0644 Flags: 0x0
> Generation: 4228691378 Version: 0x00000000
> User: 0 Group: 0 Size: 0
> File ACL: 239201 Directory ACL: 0
> Links: 0 Blockcount: 0
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x486c6c0b -- Thu Jul 3 02:04:59 2008
> atime: 0x47efcad7 -- Sun Mar 30 13:16:07 2008
> mtime: 0x486c6c0b -- Thu Jul 3 02:04:59 2008
> dtime: 0x486c6c0b -- Thu Jul 3 02:04:59 2008
> BLOCKS:
>
> Zero links, even though man/man5 references it. and the ctime/mtime/dtime
> are suspicious as well - that file belongs to an RPM that was last updated
> back on June 20, and there's no obvious culprit processes in lastcomm that
> were running at 2:04AM, and none of the current ones look obvious either.

Size and blockcount of zero as well. Delete time matching atime and
mtime. It looks like something deleted the inode from underneath the
directory entry. The question, of course, is why...

> (system was booted at 00:21, so the failure happened about 1 hours 40 mins
> after the current kernel launched).
>
> Nothing in dmesg from around 2:04AM, and nothing around when the /bin/ls is run.
>
> An 'ls -lR /usr/share' shows that the *other* 127,619 files on the filesystem
> are all OK, it's just this one.
>
> Any brilliant ideas on how to track this down further?

Is it possible that the filesystem still had lingering corruption from
my earlier bad patch? I take it you ran fsck over the filesystem and
it didn't report any errors, but did you run it with -f to force the
check? Deletion of a spurious link to the inode (that wasn't properly
accounted for in the link count) would cause the problem you see.

BTW, apologies for that bad patch, and thanks for identifying it so quickly.

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/