Re: [PATCH 2/9] vfs: export do_splice_direct() to modules

From: J. R. Okajima
Date: Tue Mar 19 2013 - 05:00:49 EST

Al Viro:
> BTW, I wonder what's the right locking for that sucker; overlayfs is probably
> too heavy - we are talking about copying a file from one fs to another, which
> can obviously take quite a while, so holding ->i_mutex on _parent_ all along
> is asking for very serious contention. OTOH, there's a pile of unpleasant

Yes, holding parent->i_mutex can be longer.
Using splice function for copy-up is simple and (probably) fastest. But
it doesn't support a "hole" in the file (sparse file). All holes are
filled with NUL byte and consumes a disk block on the upper layer. It is
a problem, especially for users who have smaller tmpfs as his upper
The copy-up with considering a hole may cost more, but it can save the
storage consumtion.

> Another fun issue is copyup vs. copyup - we want to sit and wait for copyup
> attempt in progress to complete, rather than start another one in parallel.
> And whoever comes the second must check if copyup has succeeded, obviously -
> it's possible to have user run into 5% limit and fail copyup, followed by
> root doing it successfully.

"5% limit" means the reserved are for a superuser on a filesystem,
right? As far as I know, overlayfs (UnionMount too?) solves this problem
as changing the task credential and the capability. But I am not sure
whether it solves the similar problem around the resource limit like

> Another one: overwriting rename() vs. copyup. Similar to unlink() vs. copyup().

Hmm, do you mean that, just after the parent dir lock on the underlying
fs, the copyup routine should confirm whether the target is still alive?
If so, I agree.

> Another one: read-only open() vs. copyup(). Hell knows - we obviously don't
> want to open a partially copied file; might want to wait or pretend that this
> open() has come first and give it the underlying file. The same goes for
> stat() vs. copyup().

Moreover users don't want to refer to the lower file which is obsoleted
by copyup.

> FWIW, something like "lock parent, ->create(), ->unlink(), unlock parent,
> copy data and metadata, lock parent, allocate a new dentry in covering layer
> and do ->lookup() on it, verify that it is negative and not a whiteout, lock
> child, use ->link() to put it into directory, unlock everything" would
> probably DTRT for unlink/copyup and rename/copyup. The rest... hell knows;

Please let me make sure.
You are saying,
- create the file on the upper layer
- get the "struct file" object
- hide it from users
- before copying the file, unlock the parent in order to stop the long
period locking
- copy the file without the parent lock. it doesn't matter since the
file is invisible to users.
- confirm that the target name is still available for copyup
- make the completed file visible by ->link()

It is ineteresting to ->link() with the unlinked file. While
vfs_unlink() rejects such case, it may not matter for the underlying fs.
Need to verify FS including jounals.

J. R. Okajima
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at