Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter

From: Andrii Nakryiko
Date: Mon May 23 2022 - 21:31:31 EST


On Mon, May 23, 2022 at 5:53 PM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>
> On Mon, May 23, 2022 at 4:58 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Fri, May 20, 2022 at 7:35 PM Hao Luo <haoluo@xxxxxxxxxx> wrote:
> > >
> > > On Fri, May 20, 2022 at 5:59 PM Yonghong Song <yhs@xxxxxx> wrote:
> > > > On 5/20/22 3:57 PM, Tejun Heo wrote:
> > > > > Hello,
> > > > >
> > > > > On Fri, May 20, 2022 at 03:19:19PM -0700, Alexei Starovoitov wrote:
> > > > >> We have bpf_map iterator that walks all bpf maps.
> > > > >> When map iterator is parametrized with map_fd the iterator walks
> > > > >> all elements of that map.
> > > > >> cgroup iterator should have similar semantics.
> > > > >> When non-parameterized it will walk all cgroups and their descendent
> > > > >> depth first way. I believe that's what Yonghong is proposing.
> > > > >> When parametrized it will start from that particular cgroup and
> > > > >> walk all descendant of that cgroup only.
> > > > >> The bpf prog can stop the iteration right away with ret 1.
> > > > >> Maybe we can add two parameters. One -> cgroup_fd to use and another ->
> > > > >> the order of iteration css_for_each_descendant_pre vs _post.
> > > > >> wdyt?
> > > > >
> > > > > Sounds perfectly reasonable to me.
> > > >
> > > > This works for me too. Thanks!
> > > >
> > >
> > > This sounds good to me. Thanks. Let's try to do it in the next iteration.
> >
> > Can we, in addition to descendant_pre and descendant_post walk
> > algorithms also add the one that does ascendants walk (i.e., start
> > from specified cgroup and walk up to the root cgroup)? I don't have
> > specific example, but it seems natural to include it for "cgroup
> > iterator" in general. Hopefully it won't add much code to the
> > implementation.
>
> Yep. Sounds reasonable and doable. It's just adding a flag to specify
> traversal order, like:
>
> {
> WALK_DESCENDANT_PRE,
> WALK_DESCENDANT_POST,
> WALK_PARENT_UP,

Probably something more like BPF_CG_WALK_DESCENDANT_PRE and so on?

> };
>
> In bpf_iter's seq_next(), change the algorithm to yield the parent of
> the current cgroup.