Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: James Bottomley
Date: Mon Feb 06 2017 - 12:32:44 EST


On Mon, 2017-02-06 at 09:38 -0600, lkml@xxxxxxxxxxx wrote:
> On Mon, Feb 06, 2017 at 07:18:16AM -0800, James Bottomley wrote:
> > On Mon, 2017-02-06 at 09:50 -0500, Theodore Ts'o wrote:
> > > On Sun, Feb 05, 2017 at 10:46:23PM -0800, James Bottomley wrote:
> > > > Yes, I know the problem. However, I believe most current linux
> > > > filesystems no longer guarantee stable, for the lifetime of the
> > > > file, inode numbers. The usual docker container root is
> > > > overlayfs,
> > > > which, similarly doesn't support stable inode numbers. I see
> > > > the
> > > > odd complaint about docker with overlayfs having unstable inode
> > > > numbers, but none seems to have any serious repercussions.
> > >
> > > Um, no. Most current linux file systems *do* guarantee stable
> > > inode
> > > numbers. For one thing, NFS would break horribly if you didn't
> > > have
> > > stable inode numbers. Never mind applications which depend on
> > > POSIX
> > > semantics. And you wouldn't be able to save games in rogue or
> > > nethack, either. :-)
> >
> > I believe that's why we have the superblock export operations to
> > manufacture unique filehandles in the absence of inode number
> > stability. The generic one uses inode numbers, but it doesn't have
> > to.
> > I thought reiserfs (if we can go back that far) was the first
> > generally used filesystem that didn't guarantee stable inode
> > numbers,
> > so we have a lot of historical precedence.
> >
> > Thanks to reiserfs, I thought we also iterated to weak stability
> > guarantees for inode numbers which mean no inconsistencies in
> > applications that use inode numbers for caching? It's still not
> > POSIX,
> > but I thought it was good enough for most use cases.
> >
>
> Even plain tar extraction is sensitive to directory inode stability:
> http://git.savannah.gnu.org/cgit/tar.git/tree/src/extract.c?h=release
> _1_29#n867
>
> This caused errors on overlayfs if the extraction churned through
> enough of the dentry cache to evict the relevant directory (can be
> forced to reproduce reliably via drop_caches).

Yes, I know the bug. I think it's up to tar maintainers, but if they
want to support weakly posix filesystems, they should really be using
the filehandle for this check, not device and inode number.

That said, I believe reiserfs was our only other filesystem with weak
inode number stability guarantees and that's hardly in common use
today, so if we can find a solution that gives strong stability
guarantees for out current problem filesystems, there's no reason not
to use it generally.

James