Re: [RFC] [PATCH bpf-next 1/1] bpf: Add a BPF helper for getting the cgroup path of current task

From: KP Singh
Date: Fri May 14 2021 - 07:20:48 EST


On Fri, May 14, 2021 at 6:06 AM xufeng zhang
<yunbo.xufeng@xxxxxxxxxxxxxxxxx> wrote:
>
>
> 在 2021/5/13 上午6:55, Alexei Starovoitov 写道:
> > On Wed, May 12, 2021 at 05:58:23PM +0800, Xufeng Zhang wrote:
> >> To implement security rules for application containers by utilizing
> >> bpf LSM, the container to which the current running task belongs need
> >> to be known in bpf context. Think about this scenario: kubernetes
> >> schedules a pod into one host, before the application container can run,
> >> the security rules for this application need to be loaded into bpf
> >> maps firstly, so that LSM bpf programs can make decisions based on
> >> this rule maps.
> >>
> >> However, there is no effective bpf helper to achieve this goal,
> >> especially for cgroup v1. In the above case, the only available information
> >> from user side is container-id, and the cgroup path for this container
> >> is certain based on container-id, so in order to make a bridge between
> >> user side and bpf programs, bpf programs also need to know the current
> >> cgroup path of running task.
> > ...
> >> +#ifdef CONFIG_CGROUPS
> >> +BPF_CALL_2(bpf_get_current_cpuset_cgroup_path, char *, buf, u32, buf_len)
> >> +{
> >> + struct cgroup_subsys_state *css;
> >> + int retval;
> >> +
> >> + css = task_get_css(current, cpuset_cgrp_id);
> >> + retval = cgroup_path_ns(css->cgroup, buf, buf_len, &init_cgroup_ns);
> >> + css_put(css);
> >> + if (retval >= buf_len)
> >> + retval = -ENAMETOOLONG;
> > Manipulating string path to check the hierarchy will be difficult to do
> > inside bpf prog. It seems to me this helper will be useful only for
> > simplest cgroup setups where there is no additional cgroup nesting
> > within containers.
> > Have you looked at *ancestor_cgroup_id and *cgroup_id helpers?
> > They're a bit more flexible when dealing with hierarchy and
> > can be used to achieve the same correlation between kernel and user cgroup ids.
>
>
> KP,
>
> do you have any suggestion?

I haven't really tried this yet, but have you considered using task local
storage to identify the container?

- Add a task local storage with container ID somewhere in the container
manager
- Propagate this ID to all the tasks within a container using task security
blob management hooks (like task_alloc and task_free) etc.

>
> what I am thinking is the internal kernel object(cgroup id or ns.inum)
> is not so user friendly, we can get the container-context from them for
> tracing scenario, but not for LSM blocking cases, I'm not sure how
> Google internally resolve similar issue.
>
>
> Thanks!
>
> Xufeng
>