Re: Finding hardlinks

From: Mikulas Patocka
Date: Mon Jan 08 2007 - 00:57:32 EST


And does it matter? If you rename a file, tar might skip it no matter of
hardlink detection (if readdir races with rename, you can read none of the
names of file, one or both --- all these are possible).

If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete
both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d",
tar might hardlink both "c" and "d" to "a" and "b".

No one guarantees you sane result of tar or cp -a while changing the tree.
I don't see how is_samefile() could make it worse.

There are several cases where changing the tree doesn't affect the
correctness of the tar or cp -a result. In some of these cases using
samefile() instead of st_ino _will_ result in a corrupted result.

... and those are what? If you create hardlinks while copying, you may have files duplicated instead of hardlinked in the backup. If you unlink hardlinks, cp will miss hardlinks too and create two copies of the same file (it searches the hash only for files with i_nlink > 1). If you rename files, the archive will be completely fscked up (either missing or duplicate files).

Generally samefile() is _weaker_ than the st_ino interface in
comparing the identity of two files without using massive amounts of
memory. You're searching for a better solution, not one that is
broken in a different way, aren't you?

What is the relevant case where st_ino/st_dev works and samefile(char *path1, char *path2) doesn't?

Miklos

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/