Re: VFS deadlock ?

From: Al Viro
Date: Thu Mar 21 2013 - 21:22:30 EST


On Thu, Mar 21, 2013 at 05:22:59PM -0700, Linus Torvalds wrote:
> On Thu, Mar 21, 2013 at 5:12 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > What we should do, IMO, is to turn /proc/<pid>/net into a honest symlink -
> > to ../nets/<netns ID>/net. Hell, might even make it a magical symlink
> > instead...
>
> Ok, having seen the error of my ways, I'm starting to agree with you..
> How painful would that be? Especially since we'd need to backport
> it..

Not sure; right now I'm looking through the guts of what procfs had become.
Unfortunately, there are fairly subtle interactions with other shit -
tomoyo, etc. Sigh...

BTW, the variant with d_ancestor() modification is also not enough -
/proc/1/net and /proc/2/net have different inodes, so for the pair
(/proc/net/1, /proc/2/net/stat) d_ancestor() won't trigger
even with this change. And we have /proc/net/1 < /proc/net/1/stat,
since the latter is a subdirectory of the former. With /proc/net/{1,2}/stat
having the same inode...

In theory, we can make vfs_rmdir() and vfs_unlink() check the presense of
the corresponding method before locking the victim; that would suffice to
kludge around that mess on procfs. Along with ->d_inode comparison in
lock_rename() it *might* suffice. OTOH, there are places in fs/dcache.c
where we rely on the lack of such aliases; they might or might not trigger
in case of procfs.

We are talking about the violation of fundamental assert used in
correctness analysis all over the place, unfortunately. The right fix
is to restore it; I'll try to come up with something that could be
reasonably easily backported - the kludge above is a fallback in case if
no real fix turns out to be easy to backport. Assuming that this kludge
is sufficient, that is... For 3.9 and later we *definitely* want to
restore that assertion.

PS: Once more, with feeling, to everyone even thinking of pulling something
like that again:
Hardlinks to directories do not work. Don't do that, or we'll be
sorry, and then so will you.
A Very Peeved BOFH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/