Re: [PATCH v2] proc: "mount -o lookup=" support

From: Christian Brauner
Date: Wed Jan 19 2022 - 12:31:15 EST


On Wed, Jan 19, 2022 at 06:15:22PM +0100, Alexey Gladkov wrote:
> On Wed, Jan 19, 2022 at 05:24:23PM +0100, Christian Brauner wrote:
> > On Wed, Jan 19, 2022 at 06:48:03PM +0300, Alexey Dobriyan wrote:
> > > From 61376c85daab50afb343ce50b5a97e562bc1c8d3 Mon Sep 17 00:00:00 2001
> > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > Date: Mon, 22 Nov 2021 20:41:06 +0300
> > > Subject: [PATCH 1/1] proc: "mount -o lookup=..." support
> > >
> > > Docker implements MaskedPaths configuration option
> > >
> > > https://github.com/estesp/docker/blob/9c15e82f19b0ad3c5fe8617a8ec2dddc6639f40a/oci/defaults.go#L97
> > >
> > > to disable certain /proc files. It overmounts them with /dev/null.
> > >
> > > Implement proper mount option which selectively disables lookup/readdir
> > > in the top level /proc directory so that MaskedPaths doesn't need
> > > to be updated as time goes on.
> >
> > I might've missed this when this was sent the last time so maybe it was
> > clearly explained in an earlier thread: What's the reason this needs to
> > live in the kernel?
> >
> > The MaskedPaths entry is optional so runtimes aren't required to block
> > anything by default and this mostly makes sense for workloads that run
> > privileged.
> >
> > In addition MaskedPaths is a generic option which allows to hide any
> > existing path, not just proc. Even in the very docker-specific defaults
> > /sys/firmware is covered.
> >
> > I do see clear value in the subset= and hidepid= options. They are
> > generally useful independent of opinionated container workloads. I don't
> > see the same for lookup=.
> >
> > An alternative I find more sensible is to add a new value for subset=
> > that hides anything(?) that only global root should have read/write
> > access too.
>
> Or we can allow to change permissions in the procfs only in the direction
> of decreasing (if some file has 644 then allow to set 640 or 600). In this
> case, we will not need to constantly check the whitelist.

I don't fancy any filtering or allowlist approach. I find that rather
inelegant. But if I understand you correctly is that if we were to have
decreasing permissions we could allow a (namespace) procfs-admin to set
permissions so that the relevant files are essentially read-only or not
even readable at all for container workloads. So once you've lowered
perms you can't raise them which ensures even namespace procfs-admin
can't raise them again.
Might work as well. But that implies that we wouldn't need any allowlist
at all afaict.