Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtimedisable autogroup

From: Mike Galbraith
Date: Sat Oct 20 2012 - 08:37:44 EST


On Sat, 2012-10-20 at 14:42 +0800, Xiaotian Feng wrote:
> On Fri, Oct 19, 2012 at 9:42 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > Always try and CC people who wrote the code..
> >
> > On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote:
> >> There's a regression from commit 800d4d30, in autogroup_move_group()
> >>
> >> p->signal->autogroup = autogroup_kref_get(ag);
> >>
> >> if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> goto out;
> >> ...
> >> out:
> >> autogroup_kref_put(prev);
> >>
> >> So kernel changed p's autogroup to ag, but never sched_move_task(p).
> >> Then previous autogroup of p is released, which may release task_group
> >> related with p. After commit 8323f26ce, p->sched_task_group might point
> >> to this stale value, and thus caused kernel crashes.
> >>
> >> This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0"
> >> to your /etc/sysctl.conf, your system will never boot up. It is not reasonable
> >> to put the sysctl enabled check in autogroup_move_group(), kernel should check
> >> it before autogroup_create in sched_autogroup_create_attach().
> >>
> >> Reported-by: cwillu <cwillu@xxxxxxxxxx>
> >> Reported-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
> >> Signed-off-by: Xiaotian Feng <dannyfeng@xxxxxxxxxxx>
> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >> ---
> >> kernel/sched/auto_group.c | 10 +++++-----
> >> 1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
> >> index 0984a21..ac62415 100644
> >> --- a/kernel/sched/auto_group.c
> >> +++ b/kernel/sched/auto_group.c
> >> @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> >>
> >> p->signal->autogroup = autogroup_kref_get(ag);
> >>
> >> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> - goto out;
> >> -
> >> t = p;
> >> do {
> >> sched_move_task(t);
> >> } while_each_thread(p, t);
> >>
> >> -out:
> >> unlock_task_sighand(p, &flags);
> >> autogroup_kref_put(prev);
> >> }
> >
> > So I've looked at this for all of 1 minute, but why isn't moving that
> > check up one line to be above the p->signal->autogroup assignment
> > enough?
>
> I think if autogroup is disabled, we don't need to use
> autogroup_create() to create a new ag and tg, kernel will not use it.
>
> >
> >> @@ -159,8 +155,12 @@ out:
> >> /* Allocates GFP_KERNEL, cannot be called under any spinlock */
> >> void sched_autogroup_create_attach(struct task_struct *p)
> >> {
> >> - struct autogroup *ag = autogroup_create();
> >> + struct autogroup *ag;
> >> +
> >> + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> + return;
> >>
> >> + ag = autogroup_create();
> >> autogroup_move_group(p, ag);
> >> /* drop extra reference added by autogroup_create() */
> >> autogroup_kref_put(ag);
> >
> > Man,.. so on memory allocation fail we'll put the group in
> > autogroup_default, which I think ends up being the root cgroup.
> >
> > But what happens when sysctl_sched_autogroup_enabled is false?
> >
>
> autogroup runtime disable is very nasty, as it might happen at any
> place of sched_move_group() for any setsid task.
> After sysctl_sched_autogroup_enabled is changed to false,
> autogroup_task_group(p, tg) will return tg, which is from its cpu
> cgroup.
>
> > It looks like sched_autogroup_fork() is effective in that case, which
> > would mean we'll stay in whatever group our parent is in, which is not
> > the same as being disabled.
>
> It's true, but after autogroup is disabled, p->signal->autogroup will
> never be used then, as autogroup_task_group() will not use it. But
> after autogroup is enabled again, it might be a disaster....

autogroups are intended to always exist, enable/disable only a choice of
whether you use it or not.

> I think we'd better delete the runtime enable/disable support for
> autogroup, because it might mess up the group scheduler....

Disabling runtime on/off sounds good to me too. Not because it will
mess up the scheduler, it doesn't, but the on/off switch does not take
effect instantly, and in some cases doesn't ever take effect (fully
functional on/off was shot down, so doing that now won't fly either).

So what I would do is either let the user decide once at boot, in which
case if off, creating groups would be stupid), or, just rip autogroup
completely out, since systemd is taking over the known universe anyway.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/