Re: Finding hardlinks

From: Mikulas Patocka
Date: Mon Jan 01 2007 - 19:04:29 EST


On Mon, 1 Jan 2007, Jan Harkes wrote:

On Mon, Jan 01, 2007 at 11:47:06PM +0100, Mikulas Patocka wrote:
Anyway, cp -a is not the only application that wants to do hardlink
detection.

I tested programs for ino_t collision (I intentionally injected it) and
found that CP from coreutils 6.7 fails to copy directories but displays
error messages (coreutils 5 work fine). MC and ARJ skip directories with
colliding ino_t and pretend that operation completed successfuly. FTS
library fails to walk directories returning FTS_DC error. Diffutils, find,
grep fail to search directories with coliding inode numbers. Tar seems
tolerant except incremental backup (which I didn't try). All programs
except diff were tolerant to coliding ino_t on files.

Thanks for testing so many programs, but... did the files/symlinks with
colliding inode number have i_nlink > 1? Or did you also have directories
with colliding inode numbers. It looks like you've introduced hardlinked
directories in your test which are definitely not supported, in fact it
will probably cause not only issues for userspace programs, but also
locking and garbage collection issues in the kernel's dcache.

I tested it only on files without hardlink (with i_nlink == 1) --- most programs (except diff) are tolerant to collision, they won't store st_ino in memory unless i_nlink > 1.

I didn't hardlink directories, I just patched stat, lstat and fstat to always return st_ino == 0 --- and I've seen those failures. These failures are going to happen on non-POSIX filesystems in real world too, very rarely.

BTW. POSIX supports (optionally) hardlinked directories but doesn't supoprt colliding st_ino --- so programs act according to POSIX --- but the problem is that this POSIX requirement no longer represents real world situation.

I'm surprised you're seeing so many problems. The only find problem that
I am aware of is the one where it assumes that there will be only
i_nlink-2 subdirectories in a given directory, this optimization can be
disabled with -noleaf.

This is not a bug but a feature. If filesystem doesn't count subdirectories, it should set directory's n_link to 1 and find will be ok.

The only problems I've encountered with ino_t collisions are archivers and other programs that recursively try to copy a tree while preserving hardlinks. And in all cases these seem to have no problem with such collisions as long as i_nlink == 1.

Yes, but they have big problems with directory ino_t collisions. They think that directories are hardlinked and skip processing them.

Mikulas

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/