Re: [PATCH v2] proc: "mount -o lookup=" support

From: Christian Brauner
Date: Thu Jan 20 2022 - 09:37:52 EST


On Thu, Jan 20, 2022 at 03:32:29PM +0300, Alexey Dobriyan wrote:
> On Wed, Jan 19, 2022 at 05:24:23PM +0100, Christian Brauner wrote:
> > On Wed, Jan 19, 2022 at 06:48:03PM +0300, Alexey Dobriyan wrote:
> > > From 61376c85daab50afb343ce50b5a97e562bc1c8d3 Mon Sep 17 00:00:00 2001
> > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > Date: Mon, 22 Nov 2021 20:41:06 +0300
> > > Subject: [PATCH 1/1] proc: "mount -o lookup=..." support
> > >
> > > Docker implements MaskedPaths configuration option
> > >
> > > https://github.com/estesp/docker/blob/9c15e82f19b0ad3c5fe8617a8ec2dddc6639f40a/oci/defaults.go#L97
> > >
> > > to disable certain /proc files. It overmounts them with /dev/null.
> > >
> > > Implement proper mount option which selectively disables lookup/readdir
> > > in the top level /proc directory so that MaskedPaths doesn't need
> > > to be updated as time goes on.
> >
> > I might've missed this when this was sent the last time so maybe it was
> > clearly explained in an earlier thread: What's the reason this needs to
> > live in the kernel?
>
> The reasons are:
> MaskedPaths or equivalents are blacklists, not future proof
>
> MaskedPaths is applied at container creation once,
> lookup= is applied at mount time surely but names aren't
> required to exist to be filtered (read: some silly ISV module
> gets loaded, creates /proc entries, containers get them with all
> security holes)
>
> > The MaskedPaths entry is optional so runtimes aren't required to block
> > anything by default and this mostly makes sense for workloads that run
> > privileged.
> >
> > In addition MaskedPaths is a generic option which allows to hide any
> > existing path, not just proc. Even in the very docker-specific defaults
> > /sys/firmware is covered.
>
> Sure, the patch is for /proc only. MaskedPaths can't overmount with
> /dev/null file which doesn't exist yet.
>
> > I do see clear value in the subset= and hidepid= options. They are
> > generally useful independent of opinionated container workloads. I don't
> > see the same for lookup=.
> >
> > An alternative I find more sensible is to add a new value for subset=
> > that hides anything(?) that only global root should have read/write
> > access too.

On Thu, Jan 20, 2022 at 03:23:04PM +0300, Alexey Dobriyan wrote:
> On Wed, Jan 19, 2022 at 05:24:23PM +0100, Christian Brauner wrote:
> > On Wed, Jan 19, 2022 at 06:48:03PM +0300, Alexey Dobriyan wrote:
> > > From 61376c85daab50afb343ce50b5a97e562bc1c8d3 Mon Sep 17 00:00:00 2001
> > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > Date: Mon, 22 Nov 2021 20:41:06 +0300
> > > Subject: [PATCH 1/1] proc: "mount -o lookup=..." support
> > >
> > > Docker implements MaskedPaths configuration option
> > >
> > > https://github.com/estesp/docker/blob/9c15e82f19b0ad3c5fe8617a8ec2dddc6639f40a/oci/defaults.go#L97
> > >
> > > to disable certain /proc files. It overmounts them with /dev/null.
> > >
> > > Implement proper mount option which selectively disables lookup/readdir
> > > in the top level /proc directory so that MaskedPaths doesn't need
> > > to be updated as time goes on.
> >
> > I might've missed this when this was sent the last time so maybe it was
> > clearly explained in an earlier thread: What's the reason this needs to
> > live in the kernel?
> >
> > The MaskedPaths entry is optional so runtimes aren't required to block
> > anything by default and this mostly makes sense for workloads that run
> > privileged.
> >
> > In addition MaskedPaths is a generic option which allows to hide any
> > existing path, not just proc. Even in the very docker-specific defaults
> > /sys/firmware is covered.
>
> MaskedPaths is not future proof, new entries might pop up and nobody
> will update the MaskedPaths list.
>
> > I do see clear value in the subset= and hidepid= options. They are
> > generally useful independent of opinionated container workloads. I don't
> > see the same for lookup=.
>
> The value is if you get /proc/cpuinfo you get everything else
> but you might not want everything else given that "everything else"
> changes over time.
>
> > An alternative I find more sensible is to add a new value for subset=
> > that hides anything(?) that only global root should have read/write
> > access too.

Thanks for providing some more details.

If we really introduce new proc files in the future that are unsafe for
unprivileged containers then that's a whole bigger problem.

We shouldn't taper over this with a procfs mount option however.
Especially, since it's very likely that such new procfs files that would
be exploitable in unprivileged containers would also be exploitable by
regular users. The argument can't be that in order to protect against
buggy or information leaking future proc files we need to give proc a
special mount option for containers to restrict access to files and
directories.

And for the legacy files that existed before containers were a big thing
MaskedPath in userspace has worked fine with the last changes to update
the list from 2018 for the addition of a rather old directory.

And the same problem exists for sysfs. That's why /sys/firmware is in
there. (In fact, it can be argued that they should restrict sysfs way
more via MaskedPaths than procfs for privileged containers since it
leaks way more system-wide information and provides a way bigger attack
surface which is presumable why the mount is ro but then strangely only
hide /sys/firmware. Anyway, besides the point.)

MaskedPath is mostly a protection mechanism useful for privileged
containers as an unprivileged container can't modify anything that would
allow it to attack the system.

Ultimately, I think the current proposal here is too much modeled after
how a specific tool runs specific workloads and for containers and I
don't think that's a good idea.

We should do a generally useful thing that doesn't require any dynamic
filtering and userspace giving us files that are ok to show.

Alternative proposals to appear later in the thread. I'd be ok to
endorse one of those if you were to implement one of them. But for now
I'm not firmly convinced of lookup=.