Re: [PATCH] prctl: propagate has_child_subreaper flag to every descendant

From: Oleg Nesterov
Date: Mon Jan 23 2017 - 06:55:43 EST


On 01/22, Pavel Tikhomirov wrote:
>
> >
> >Hmm. could you explain how this change helps CRIU? I mean, why
> >restorer can't do prctl(CHILD_SUBREAPER) before the first fork?
>
> Imagine we have these tree in pidns:
>
> 1: has_child_subreaper == 0 && is_child_subreaper == 0
> |-2: has_child_subreaper == 0 && is_child_subreaper == 1
> | |-3: has_child_subreaper == 0 && is_child_subreaper == 0
> | | |-5: has_child_subreaper == 0 && is_child_subreaper == 0
> | |-4: has_child_subreaper == 1 && is_child_subreaper == 0
> | | |-6: has_child_subreaper == 1 && is_child_subreaper == 0
>
> before c/r: If 4 dies 6 will reparent to 2, if 3 dies 5 will reparent to 1.
> after c/r: (where restorer had is_child_subreaper == 1, everybody in the
> tree will have has_child_subreaper == 1) Everybody will reparent to 2.

This is clear, but this can only happen if 2 forks 3 and after that
sets is_child_subreaper, right?

And if someone actually does this then your patch can break this
application, no?

IOW. Currently CRIU can't restore the process tree with the same
has_child_subreaper bits if some process forks before
prctl(PR_SET_CHILD_SUBREAPER). It restores the tree as if prctl()
was called before the 1st fork.

So you change the semantics of PR_SET_CHILD_SUBREAPER and now CRIU
is fine simply because you remove this feature: the sub-reaper can
no longer pre-fork the children which should reparent to the previous
reaper.

I won't really argure, but I am not sure this is good idea... At least
I think this should be clearly documented.

> >You don't need this new member and descendants_lock. task_struct has
> >the ->real_parent pointer so you can work the tree without recursion.
>
> Sorry I don't get how I can walk down the tree of all descendants with help
> of ->real_parent pointer, can you please point on some example or explain a
> bit more? (I see task_is_descendant() in security/yama/yama_lsm.c but we
> will need to check it for every process, not only descendants, the latter
> can be a lot faster.)

I'll send a patch, probably a generic helper makes sense.

Btw task_is_descendant() looks wrong at first glance.

Oleg.