Re: [tip:sched/core] sched: Fix race in task_group()

From: Stefan Bader
Date: Fri Oct 19 2012 - 03:41:00 EST


On 18.10.2012 15:33, Luis Henriques wrote:
> On Thu, Oct 18, 2012 at 12:23:38PM +0200, Stefan Bader wrote:
>> On 18.10.2012 10:27, cwillu wrote:
>>> On Tue, Jul 24, 2012 at 8:21 AM, tip-bot for Peter Zijlstra
>>> <peterz@xxxxxxxxxxxxx> wrote:
>>>> Commit-ID: 8323f26ce3425460769605a6aece7a174edaa7d1
>>>> Gitweb: http://git.kernel.org/tip/8323f26ce3425460769605a6aece7a174edaa7d1
>>>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>> AuthorDate: Fri, 22 Jun 2012 13:36:05 +0200
>>>> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>>>> CommitDate: Tue, 24 Jul 2012 13:58:20 +0200
>>>>
>>>> sched: Fix race in task_group()
>>>>
>>>> Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
>>>> Don't call task_group() too many times in set_task_rq()"), he
>>>> found the reason to be that the multiple task_group()
>>>> invocations in set_task_rq() returned different values.
>>>>
>>>> Looking at all that I found a lack of serialization and plain
>>>> wrong comments.
>>>>
>>>> The below tries to fix it using an extra pointer which is
>>>> updated under the appropriate scheduler locks. Its not pretty,
>>>> but I can't really see another way given how all the cgroup
>>>> stuff works.
>>>>
>>>> Reported-and-tested-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
>>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>>>> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
>>>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>>>
>>> I just finished bisecting a crash on boot to this commit; booting with
>>> "noautogroup" brings it back.
>>>
>>> 3.5.4 is the latest -stable that still boots, and none of the 3.6 rc's
>>> boot at all.
>>>
>>> Photo of the bug (3.6.0next is 3.6 + btrfs's for-linus):
>>> https://lh5.googleusercontent.com/-0DY-YYhgvzs/UHdB-BQdzMI/AAAAAAAAAEg/QhY9rgxnv98/s811/2012-10-11
>>>
>>
>> On a very quick glance I wonder whether there might be a case where sched_fork
>> goes into set_task_cpu with a different cpu than the current but has not yet
>> task_group.sched_task_group set to something valid...
>>
>>
>
> I was looking at another bug report [1] which may be related with this
> issue. Basically, it looks like there is a race window where
> resetting sched_autogroup_enabled will cause a crash on
> shutdown/reboot. In the bug report, the user has added:
>
> echo 0 > /proc/sys/kernel/sched_autogroup_enabled
>
> to /etc/rc.local. This will cause a NULL pointer dereference during
> shutdown (and it is reproducible with mainline kernel 3.7.0-rc1).
>
> By using the kernel parameter noautogroup I *wasn't* able to reproduce
> this issue.
>
> After a little bit of digging, commit
> 800d4d30c8f20bd728e5741a3b77c4859a613f7c ("sched, autogroup: Stop
> going ahead if autogroup is disabled") caught my attention as it
> changes the following code path when sched_autogroup_enabled is
> disabled:
>
> sched_autogroup_create_attach()
> autogroup_move_group()
> sched_move_task() <<-- conditionally invoked
> task_move_group_fair()
> set_task_rq()
> task_group()
> autogroup_task_group()
>
> And commit 8323f26ce3425460769605a6aece7a174edaa7d1 ("sched: Fix
> race in task_group()") actually adds code to this conditional path (in
> sched_move_task()).
>
> A quick test shows that reverting
> 800d4d30c8f20bd728e5741a3b77c4859a613f7c (i.e., always going through
> the whole call tree) seems to fix it or, at least, doesn't trigger the
> NULL pointer. But again, I may just be doing something foolish,
> hiding something else. It is also possible that this is a completely
> different issue.

I think you are right Luis. Looking at it with a bit more time it looks that,
while the patch you mention optimizes for the case where sched_autogroup was
disabled from the beginning (noautogroup), it is rather bad for the case where
it gets disabled after booting with it since by then some tasks likely have gone
into some task groups and when the sched_move_task is skipped, then the pointer
to the old autogroup (which I suspect of beeing freed) still remains set. This
all was unlikely a problem when the autogroup was looked up from the other
place. But then that would race while setting it.

kernel/sched/sched_auto_group.c
if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
+ if (p->sched_task_group == &root_task_group)
+ goto out;
- goto out;

I wonder whether this would be an acceptable (and working since I actually have
not tried to compile it) way out of it...

-Stefan
>
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1055222
>
> Cheers,
> --
> Luis
>


Attachment: signature.asc
Description: OpenPGP digital signature