Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily

From: Peter Zijlstra
Date: Wed Jun 22 2016 - 04:24:09 EST


On Wed, Jun 22, 2016 at 11:10:46AM +0300, Konstantin Khlebnikov wrote:
> On 22.06.2016 00:10, Peter Zijlstra wrote:
> >On Thu, Jun 16, 2016 at 03:57:01PM +0300, Konstantin Khlebnikov wrote:
> >>Cgroup created inside throttled group must inherit current throttle_count.
> >>Broken throttle_count allows to nominate throttled entries as a next buddy,
> >>later this leads to null pointer dereference in pick_next_task_fair().
> >>
> >>This patch initialize cfs_rq->throttle_count at first enqueue: laziness
> >>allows to skip locking all rq at group creation. Lazy approach also allows
> >>to skip full sub-tree scan at throttling hierarchy (not in this patch).
> >
> >You're talking about taking rq->lock in alloc_fair_sched_group(), right?
> >
> >We're about to go do that anyway... But I suppose for backports this
> >makes sense. Doing it at creation time also avoids the issues Ben
> >raised, right?
>
> Yes, all will be fine. But for 8192-cores this will be disaster =)

Well, creating cgroups isn't something you do much of, and creating them
will be proportionally expensive already, as we allocate all kinds of
per-cpu data.

In any case, we 'need' to do this because of the per entity load
tracking stuff, entities, even blocked, should be added to the cfs_rq.

> throttle_count must be initialized after linking tg into lists. obviously.

Crud, that's later than we currently take the rq lock. Let me see how
much pain it is to re-order all that.