Re: [PATCH v5 13/16] ima: Move some IMA policy and filesystem related variables into ima_namespace

From: Christian Brauner
Date: Sat Dec 11 2021 - 04:50:44 EST


On Fri, Dec 10, 2021 at 08:57:11AM -0500, Stefan Berger wrote:
>
> On 12/10/21 06:32, Christian Brauner wrote:
> > On Thu, Dec 09, 2021 at 07:57:02PM -0500, Stefan Berger wrote:
> > > On 12/9/21 14:11, Christian Brauner wrote:
> > > > From 1f03dc427c583d5e9ebc9ebe9de77c3c535bbebe Mon Sep 17 00:00:00 2001
> > > > From: Christian Brauner <christian.brauner@xxxxxxxxxx>
> > > > Date: Thu, 9 Dec 2021 20:07:02 +0100
> > > > Subject: [PATCH] !!!! HERE BE DRAGONS - UNTESTED !!!!
> > > >
> > > > ---
> > > > security/integrity/ima/ima_fs.c | 43 +++++++++++++++++++++++++++++----
> > > > 1 file changed, 38 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
> > > > index 583462b29cb5..d5b302b925b8 100644
> > > > --- a/security/integrity/ima/ima_fs.c
> > > > +++ b/security/integrity/ima/ima_fs.c
> > > > @@ -317,10 +317,14 @@ static ssize_t ima_read_policy(char *path)
> > > > static ssize_t ima_write_policy(struct file *file, const char __user *buf,
> > > > size_t datalen, loff_t *ppos)
> > > > {
> > > > - struct ima_namespace *ns = get_current_ns();
> > > > + struct ima_namespace *ns;
> > > > + struct user_namespace *user_ns;
> > > > char *data;
> > > > ssize_t result;
> > > > + user_ns = ima_filp_private(filp);
> > > > + ns = user_ns->ima_ns
> > > > +
> > > > if (datalen >= PAGE_SIZE)
> > > > datalen = PAGE_SIZE - 1;
> > > > @@ -373,26 +377,51 @@ static const struct seq_operations ima_policy_seqops = {
> > > > };
> > > > #endif
> > > > +static struct user_namespace *ima_filp_private(struct file *filp)
> > > > +{
> > > > + if (!(filp->f_flags & O_WRONLY)) {
> > > > +#ifdef CONFIG_IMA_READ_POLICY
> > > > + struct seq_file *seq;
> > > > +
> > > > + seq = filp->private_data;
> > > > + return seq->private;
> > > > +#endif
> > > > + }
> > > > + return filp->private_data;
> > > > +}
> > > > +
> > > > /*
> > > > * ima_open_policy: sequentialize access to the policy file
> > > > */
> > > > static int ima_open_policy(struct inode *inode, struct file *filp)
> > > > {
> > > > - struct ima_namespace *ns = get_current_ns();
> > > > + struct user_namespace *user_ns = current_user_ns();
> > >
> > > Do we have to take a reference on the user namespace assuming one can open
> > > the file, pass the fd down the hierarchy, and then the user namespace with
> > > the opened file goes away? Or is there anything else that keeps the user
> > > namespace alive?
> > No, we don't. When ima_policy_open() is called we do current_user_ns()
> > but that will be guaranteed to be identical to filp->f_cred->user_ns.
> > And f_cred is a reference that has been taken when the vfs allocated a
> > struct file for this .open call so won't go away until the last fput.
> >
> > My proposal is also too complicated, I think.
> > (The booster is giving me the same side-effects as my second shot so
> > this looks like two good days of fever and headache. So I'll use that as
> > an excuse. :))
> >
> > Your patch series as it stands has a bit of a security issue with those
> > get_current_ns() calls across differnet file/seq_file operations.
> >
> > You have to make an architectural decision, I think. I see two sensible
> > options:
> > 1. The relevant ima_ns that .open/.read/.write operate on is always taken
> > to be the ima_ns of the filesystem's userns, i.e.
> > sb->s_user_ns->ima_ns.
> > This - but I'm not an ima person - makes the most sense to me and the
> > semantics are straightforward. If I write to a file to alter some
> > policy then I expect the ima namespace of the user namespace to be
> > affected that the securityfs instance was mounted in.
> > 2. The relevant ima_ns that .open/.read/.write operate on is always
> > taken to be the one of the opener. I don't really like that as that
> > gets weird if for some complicated reason the caller is not located
> > in the userns the filesystem was mounted in (weird mount propagation
> > scenario or sm). It also feels strange to operate on an ima_ns that's
> > different from s_user_ns->ima_ns in a securityfs instance.
>
> We have this situation because one can setns() to another mount namespaces
> but the data shown by SecurityFS lives in a user namespace, right? And now
> we need to decide whether to affect the data in the user namespace  that did
> the open (option 2) or to which the SecurityFS belongs to (option 1). If we
> were to open a regular file it would be option 1, so we should probably not
> break that existing semantic and also choose option 1 unless there one
> wasn't allowed to choose the user namespace the SecurityFS files belonged to
> then it should be option 2 but then we have file descriptor passing where
> 'being allowed' can change depending on who is reading/writing a file... Is

A general remark that's always worth repeating: under no circumstances
should the object that you're operating on change from .open to
.read/.write. Consider the following very rough sketch:

pid = fork():
if (pid == 0) {
// create USERNS1
unshare(CLONE_NEWUSER);

// write an idmapping for getuid() to userns 0

mount("", "/sys/kernel/security", "securityfs", [...]);

int fd_ima = open("/sys/kernel/security/some_ima_file", O_WRONLY);

// Send fd_ima to parent via SCM_RIGHTS
}
// Receive fd_ima from child.

write(fd_ima, "SET_OPTION_ON_INIT_USER_NS", ...);

That example above is an instant security issue if the ima_ns changed
depending on the caller since it means you could open a file in an
unprivileged userns and then funnel it to a random process in
init_user_ns that you control via SCM_RIGHTS and have it do the write()
and instead of USERNS1->ima_ns you'd suddenly have changed
init_user_ns->ima_ns.
And this is what your previous patch with these multiple
get_current_ns() calls allowed.

So the write in the example above must operate on USERNS1->ima_ns.

Now, both option 1 and 2 will ensure that the ima_ns that is operated on
is always the same across .open/.read/.write:
1. securityfs->sb->s_user_ns->ima_ns this is trivially guaranteed
because the filesystem's user namespace doesn't change no matter
where you access that filesystem from.
2. securityfs_file->f_cred->user_ns->ima_ns this is guaranteed because
f_cred stashes the openers credentials.

What I fundamentally dislike about option 2 (f_cred) is - and I tried to
say that in my prior mail - that what ima_ns is operated on depends on
the userns of the process that called .open, i.e. the contents in
securityfs change depending on the opener's userns.

And I think that is fundamentally sloppy semantics. The contents of a
filesystem should generally not depend on the caller if it can be
avoided. We have a few instances where this is the case (e.g. some
procfs sysctls) but especially for securityfs this does not make sense.

> there anything that would prevent us from setns()'ing to that target user
> namespace so that we would now see that of a user namespace that we are not
> allowed to see?

If you're really worried about someone being able to access a securityfs
instance whose userns doesn't match the userns the securityfs instance
was mounted in there are multiple ways to fix it. The one that I tend to
prefer is: