Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter

From: Yosry Ahmed
Date: Fri May 20 2022 - 03:59:55 EST


On Fri, May 20, 2022 at 12:41 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> On Fri, May 20, 2022 at 01:21:31AM +0000, Yosry Ahmed wrote:
> > From: Hao Luo <haoluo@xxxxxxxxxx>
> >
> > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this
> > iter doesn't iterate a set of kernel objects. Instead, it is supposed to
> > be parameterized by a cgroup id and prints only that cgroup. So one
> > needs to specify a target cgroup id when attaching this iter. The target
> > cgroup's state can be read out via a link of this iter.
> >
> > Signed-off-by: Hao Luo <haoluo@xxxxxxxxxx>
> > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
>
> This could be me not understanding why it's structured this way but it keeps
> bothering me that this is adding a cgroup iterator which doesn't iterate
> cgroups. If all that's needed is extracting information from a specific
> cgroup, why does this need to be an iterator? e.g. why can't I use
> BPF_PROG_TEST_RUN which looks up the cgroup with the provided ID, flushes
> rstat, retrieves whatever information necessary and returns that as the
> result?

I will let Hao and Yonghong reply here as they have a lot more
context, and they had previous discussions about cgroup_iter. I just
want to say that exposing the stats in a file is extremely convenient
for userspace apps. It becomes very similar to reading stats from
cgroupfs. It also makes migrating cgroup stats that we have
implemented in the kernel to BPF a lot easier.

AFAIK there are also discussions about using overlayfs to have links
to the bpffs files in cgroupfs, which makes it even better. So I would
really prefer keeping the approach we have here of reading stats
through a file from userspace. As for how we go about this (and why a
cgroup iterator doesn't iterate cgroups) I will leave this for Hao and
Yonghong to explain the rationale behind it. Ideally we can keep the
same functionality under a more descriptive name/type.

>
> Thanks.
>
> --
> tejun