Re: [GIT PULL] ocfs2 changes for 2.6.32

From: Linus Torvalds
Date: Tue Sep 15 2009 - 12:31:46 EST




On Mon, 14 Sep 2009, Joel Becker wrote:
> >
> > If you're talking about falling back to manually just copying the data,
> > then nobody is interested in that. User space can do that better with a
> > simple read-write loop or with splice, or whatever. There's no reaason
> > what-so-ever to do that.
>
> I'm talking about any facility for copying that isn't just a
> userspace loop. Like your discussion of network filesystems.

HOW?

We need to have a per-filesystem interface to that.

Having a '->copyfile()' function would be great.

But don't you see how _idiotic_ it is to then also having a '->reflink()'
function that does _conceptually_ the exact same thing, except it does it
by incrementing a usage count instead?

Do you see why I'm so unhappy to add a ->reflink() function?

> Hence I brought this to the filesystem summit and then fsdevel
> rather than just implementing it in ocfs2. I know NFS folks were in the
> room in April, and they said the call definition was workable. Can't
> remember if CIFS folks were there, but I think so.

It's not workable if you define the 'reflink()' function to not use any
disk space on the filesystem. Because SMB _will_ do a copy (and I presume
the NFS thing will too). So it would not in general be what you call
reflink, it will not be a "snapshot".

So if you _define_ the semantics of "reflink" to be that it's atomic and
doesn't use any new diskspace (apart from the new inode/directory entry,
of course), then it will be almost totally useless to other filesystems.

In fact, it's entirely possible to have filesystems that can avoid copying
the _data_ blocks, but would need to copy the indirect blocks - maybe the
data blocks are ref-counted, but the metadata needs to be per-file (I can
see many reasons to do it that way, even if it's organized as a tree -
it's how we do page table COW, for example, and it makes some things much
simpler).

Would that be a 'reflink()' or not? I have no way of knowing, because you
have decided on reflink on a purely ocfs2-specific implementation basis.
But I do know that such a filesystem would be perfectly happy to have a
'copyfile' function.

This is why I want the VFS pointers to be about _semantics_, not about
some random implementation detail.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/