Re: Finding hardlinks

From: Denis Vlasenko
Date: Thu Jan 11 2007 - 18:37:39 EST


On Thursday 28 December 2006 10:06, Benny Halevy wrote:
> Mikulas Patocka wrote:
> >>> If user (or script) doesn't specify that flag, it doesn't help. I think
> >>> the best solution for these filesystems would be either to add new syscall
> >>> int is_hardlink(char *filename1, char *filename2)
> >>> (but I know adding syscall bloat may be objectionable)
> >> it's also the wrong api; the filenames may have been changed under you
> >> just as you return from this call, so it really is a
> >> "was_hardlink_at_some_point()" as you specify it.
> >> If you make it work on fd's.. it has a chance at least.
> >
> > Yes, but it doesn't matter --- if the tree changes under "cp -a" command,
> > no one guarantees you what you get.
> > int fis_hardlink(int handle1, int handle 2);
> > Is another possibility but it can't detect hardlinked symlinks.

It also suffers from combinatorial explosion.
cp -a on 10^6 files will require ~0.5 * 10^12 compares...

> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> hold water for modern file systems and that creates real problems for
> backup apps which rely on that to detect hard links.

Yes, and it should have been obvious at 32->64bit inode# transition.
Unfortunately people tend to think "ok, NOW this new shiny BIGNUM-bit
field is big enough for everybody". Then cycle repeats in five years...

I think the solution is that inode "numbers" should become
opaque _variable-length_ hashes. They are already just hash values,
this is nothing new. All problems stem from fixed width of inode# only.

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/