Re: Finding hardlinks

From: Mikulas Patocka
Date: Fri Jan 05 2007 - 09:08:43 EST


Well, sort of. Samefile without keeping fds open doesn't have any
protection against the tree changing underneath between first
registering a file and later opening it. The inode number is more

You only need to keep one-file-per-hardlink-group open during final
verification, checking that inode hashing produced reasonable results.

What final verification? I wasn't just talking about 'tar' but all
cases where st_ino might be used to check the identity of two files at
possibly different points in time.

Time A: remember identity of file X
Time B: check if identity of file Y matches that of file X

With samefile() if you open X at A, and keep it open till B, you can
accumulate large numbers of open files and the application can fail.

If you don't keep an open file, just remember the path, then renaming
X will foil the later identity check. Changing the file at this path
between A and B can even give you a false positive. This applies to
'tar' as well as the other uses.

And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible).

If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", tar might hardlink both "c" and "d" to "a" and "b".

No one guarantees you sane result of tar or cp -a while changing the tree. I don't see how is_samefile() could make it worse.

Mikulas

Miklos

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/