Re: [PATCH v3 bpf-next 02/21] bpf: Sysctl hook

From: Andrey Ignatov
Date: Tue Apr 09 2019 - 16:17:24 EST


Kees Cook <keescook@xxxxxxxxxxxx> [Tue, 2019-04-09 09:54 -0700]:
> On Fri, Apr 5, 2019 at 12:36 PM Andrey Ignatov <rdna@xxxxxx> wrote:
> > Containerized applications may run as root and it may create problems
> > for whole host. Specifically such applications may change a sysctl and
> > affect applications in other containers.
> >
> > Furthermore in existing infrastructure it may not be possible to just
> > completely disable writing to sysctl, instead such a process should be
> > gradual with ability to log what sysctl are being changed by a
> > container, investigate, limit the set of writable sysctl to currently
> > used ones (so that new ones can not be changed) and eventually reduce
> > this set to zero.
>
> Actual-root-in-a-container is pretty powerful. What about module
> loading, or /dev files? Instead of sysctl-specific hooks, what about
> VFS hooks, which would be able to cover all file-based APIs. This is
> what, for example, Landlock was working on doing (also with eBPF).

This sysctl hook doesn't try to solve all possible problems that
root-in-a-container may impose, but rather focuses on one specific
problem.

Generic BPF hooks in VFS can be a good idea and in fact there was a try
to add BPF hook for file_open [1], but apparently it needs more work.

Flexibility of generic hooks has its disadvantages though when it comes
to making what this sysctl-focused hook does. E.g. with sysctl hook
administrator shouldn't care about sys_open, sys_read, sys_write
separately if they want to implement policies (or just tracing) based on
sysctl name / value for cgroup, but can have just one BPF program
instead that has all necessary context to make decisions.

Also accesses to sysctl is usually just a very small part of all calls
to sys_open/sys_read/sys_write on a system, outside of fast-path (e.g.
nobody should care if write to sysctl is a bit slower), and calling BPF
hook for every sys_open/sys_read/sys_write when administrator wants to
limit just sysctl changes can be too expensive.

Furthermore sysctl focused hook allows to tailor its API to sysctl needs
and avoid exposing APIs that make sense only for sysctl to all
file-based accesses and vise versa.

[1] https://lwn.net/Articles/767615/

--
Andrey Ignatov