Re: [tip:sched/core] sched: Fix race in task_group()

From: cwillu
Date: Thu Oct 18 2012 - 04:27:07 EST


On Tue, Jul 24, 2012 at 8:21 AM, tip-bot for Peter Zijlstra
<peterz@xxxxxxxxxxxxx> wrote:
> Commit-ID: 8323f26ce3425460769605a6aece7a174edaa7d1
> Gitweb: http://git.kernel.org/tip/8323f26ce3425460769605a6aece7a174edaa7d1
> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> AuthorDate: Fri, 22 Jun 2012 13:36:05 +0200
> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> CommitDate: Tue, 24 Jul 2012 13:58:20 +0200
>
> sched: Fix race in task_group()
>
> Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
> Don't call task_group() too many times in set_task_rq()"), he
> found the reason to be that the multiple task_group()
> invocations in set_task_rq() returned different values.
>
> Looking at all that I found a lack of serialization and plain
> wrong comments.
>
> The below tries to fix it using an extra pointer which is
> updated under the appropriate scheduler locks. Its not pretty,
> but I can't really see another way given how all the cgroup
> stuff works.
>
> Reported-and-tested-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

I just finished bisecting a crash on boot to this commit; booting with
"noautogroup" brings it back.

3.5.4 is the latest -stable that still boots, and none of the 3.6 rc's
boot at all.

Photo of the bug (3.6.0next is 3.6 + btrfs's for-linus):
https://lh5.googleusercontent.com/-0DY-YYhgvzs/UHdB-BQdzMI/AAAAAAAAAEg/QhY9rgxnv98/s811/2012-10-11
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/