Re: fsync on large files

Alexander Viro (viro@math.psu.edu)
Thu, 18 Feb 1999 03:43:41 -0500 (EST)


On Wed, 17 Feb 1999, Linus Torvalds wrote:

> > mkdir a
> > mkdir b
> > mkdir a/c
> > ln a/c b/c
> > mkdir a/c/d
> > mv b a/c/d
>
> Ayee. Good spotting. Nasty. I was wrong, it's not all that easy at all.
>
> I would have just walked it at ln time, and wouldn't ever have noticed
> that case.
>
> It's still a consistent filesystem, but "find" would certainly get
> conniptions on seeing the above ;)

Ahem... Checking that we do not detach loops will be equally nasty - we'll
have to trace *all* ways down from the target to make sure that at least
one doesn't go through the source. And *that* with the need to check for
loops when we are doing tracing.

> > > We can't do them right as is, but getting rid of ".." in the on-disk
> > > directory structure would be one step, and I think I can handle the dentry
> > > aliasing issue too.
> >
> > Could you elaborate? I am trying to figure out the way to do that
> > and for the case of multiple links *from the same directory* I have a
> > kinda-sorta solution. For generic thing... I would really like to hear
> > your variant.
>
> It's basically the same as for hardlinked files: index by <inode:filename>
> rather than by <dentry:filename>, and move the child list into the inode.
> We already have the inode, and we already disallow having children of
> negative dentries, so we have all the rules in place.
Hmm... I'm afraid that we'll get *very* nasty races out of that
and/or really bad time trying to keep the graph connected.

> You also have to do the dentry locking in the inode - but guess what? We
> already do so anyway by re-using inode->i_sem for that (which is
> conceptually wrong the way it is laid out right now, but it's the right
> thing once you use the inode for child management).

> This was why inodes were done in the first place - you can do anything in
> computer science by adding a layer of indirection (djikstra?). If you
> don't want to allow aliases, you'd design your filesystem with the inode
> inside the directory structure itself, instead of having the extra level
> of indirection.

Yes. We have namespace and we have nameless files. No problems
with that. But symmetry between regular files and directories got broken
when we (OK, Ken and Dennis ;-) prohibited writing to them. It got further
broken when instead of link()/unlink()/mkdir() (that was just a special
case of mknod()) there appeared mkdir()/rmdir()/rename(). Main reason
behind them was in atomicity (according to authors). readdir() was the
final straw. There are 4 types of fs objects: regular files, directories,
symlinks and, erm, specials. Whether we do explicit devfs or not,
logically devices belong to layer behind the fs and device files are just
inter-layer links to said layer. Ditto for FIFOs and sockets.
We have 2 indirection levels, not 1. Beyond the namespace and
inodes there is such thing as contents of objects. We already have some
rules regarding the contents of directories - we want connectedness. And
that spoils the whole game.

> So we have all the support stuff for it already as far as I can tell.
>
> > > Imagine, for example, a directory tree with a shared component. Wouldn't
> > > it be nice to just link it into the tree at multiple points? Imagine a
> > > chroot() environment, for a moment - symlinks don't work to the outside,
> > > but hardlinking does.
> >
> > nullfs. It was invented for such things and we can do it. With
> > *very* small overhead - dcache helps big way here.
>
> We can't do it without handling the dcache alias issue. Otherwise:
>
> mkdir a
> mkdir b
> mount -t nullfs a b
> touch a/c
> rm b/c
>
> and now you have a really confused dcache as it is now, because you still
> have the a/c dentry chain live even though the file was removed through
> the b/c dentry.
Why? Keep the pointers both downwards and upwards and that's it.
*And* keep the inode stacks, not only dentry ones. That's why I want to
implement light-weight inodes. And that would *really* solve the problem
with named pipes, sockets and friends - let them live in their own
filesystems behind the scene and consider inode in real fs as inter-layer
link instead of cramming two inodes into one and getting a helluva lot of
special cases over the VFS. Normal pipes would just live in the same
behind-the-scene fs as named pipes. Ditto for real sockets - socket in
real fs acts like a dangling link until we do bind() on it. Ditto for
devices - look at the pieces of foo_read_inode() dealing with them and
you'll see that we already have the needed code. It's *not* a devfs -
completely different beast.

> I'd still like to allow hard links too, but my mind isn't quite as twisted
> as yours is, judging by your nasty example ;)

D'oh.
(a) I've spent a lot of time screwing with VFS and filesystems during the
last year and
(b) Helsinki seems to be kinder place than St. Petersburg ;-)
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/