Re: [RFC v2 19/19] ima: Setup securityfs for IMA namespace

From: Christian Brauner
Date: Mon Dec 06 2021 - 11:02:46 EST


On Mon, Dec 06, 2021 at 08:38:29AM -0500, James Bottomley wrote:
> On Mon, 2021-12-06 at 13:08 +0100, Christian Brauner wrote:
> > On Fri, Dec 03, 2021 at 11:37:14AM -0800, Casey Schaufler wrote:
> > > On 12/3/2021 10:50 AM, James Bottomley wrote:
> > > > On Fri, 2021-12-03 at 13:06 -0500, Stefan Berger wrote:
> > > > > On 12/3/21 12:03, James Bottomley wrote:
> > > > > > On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote:
> > > > > > [...]
> > > > > > > static int securityfs_init_fs_context(struct fs_context
> > > > > > > *fc)
> > > > > > > {
> > > > > > > + int rc;
> > > > > > > +
> > > > > > > + if (fc->user_ns->ima_ns->late_fs_init) {
> > > > > > > + rc = fc->user_ns->ima_ns->late_fs_init(fc-
> > > > > > > >user_ns);
> > > > > > > + if (rc)
> > > > > > > + return rc;
> > > > > > > + }
> > > > > > > fc->ops = &securityfs_context_ops;
> > > > > > > return 0;
> > > > > > > }
> > > > > > I know I suggested this, but to get this to work in general,
> > > > > > it's going to have to not be specific to IMA, so it's going
> > > > > > to have to become something generic like a notifier
> > > > > > chain. The other problem is it's only working still by
> > > > > > accident:
> > > > >
> > > > > I had thought about this also but the rationale was:
> > > > >
> > > > > securityfs is compiled due to CONFIG_IMA_NS and the user
> > > > > namespace exists there and that has a pointer now to
> > > > > ima_namespace, which can have that callback. I assumed that
> > > > > other namespaced subsystems could also be reached then via
> > > > > such a callback, but I don't know.
> > > >
> > > > Well securityfs is supposed to exist for LSMs. At some point
> > > > each of those is going to need to be namespaced, which may
> > > > eventually be quite a pile of callbacks, which is why I thought
> > > > of a notifier.
> > >
> > > While AppArmor, lockdown and the integrity family use securityfs,
> > > SELinux and Smack do not. They have their own independent
> > > filesystems. Implementations of namespacing for each of SELinux and
> > > Smack have been proposed, but nothing has been adopted. It would be
> > > really handy to namespace the infrastructure rather than each
> > > individual LSM, but I fear that's a bigger project than anyone will
> > > be taking on any time soon. It's likely to encounter many of the
> > > same issues that I've been dealing with for module stacking.
> >
> > The main thing that bothers me is that it uses simple_pin_fs() and
> > simple_unpin_fs() which I would try hard to get rid of if possible.
> > The existence of this global pinning logic makes namespacing it
> > properly more difficult then it needs to be and it creates imho wonky
> > semantics where the last unmount doesn't really destroy the
> > superblock.
>
> So in the notifier sketch I posted, I got rid of the pinning but only
> for the non root user namespace use case ... which basically means only
> for converted consumers of securityfs. The last unmount of securityfs
> inside the namespace now does destroy the superblock ... I checked.
>
> The same isn't true for the last unmount of the root namespace, but
> that has to be so to keep the current semantics.
>
> > Instead subsequents mounts resurface the same superblock. There
> > might be an inherent design reason why this needs to be this way but
> > I would advise against these semantics for anything that wants to be
> > namespaced. Probably the first securityfs mount in init_user_ns can
> > follow these semantics but ones tied to a non-initial user namespace
> > should not as the userns can go away. In that case the pinning logic
> > seems strange as conceptually the userns pins the securityfs mount as
> > evidenced by the fact that we key by it in get_tree_keyed().
>
> Yes, that's basically what I did: pin if ns == &init_user_ns but don't
> pin if not. However, I'm still not sure I got the triggers right. We
> have to trigger the notifier call (which adds the namespaced file
> entries) from context free, because that's the first place the
> superblock mount is fully set up ... I can't do it in fill_super
> because the mount isn't fully initialized (and the locking prevents
> it). I did manage to get the notifier for teardown triggered from
> kill_super, though.

I don't think you need a vfsmount at all to be honest. I think this can
all be done without much ceremony. Here's a brutalist completely
untested patch outlining one approach: