Re: [PATCH 1/2] sched: Enforce order of leaf CFS runqueues

From: Paul Turner
Date: Tue Jul 19 2011 - 13:49:23 EST

On Tue, Jul 19, 2011 at 6:08 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2011-07-18 at 16:24 -0700, Paul Turner wrote:
>> Subject: [PATCH] sched: handle on_list ancestor in leaf_add_cfs_rq()
>> From: Paul Turner <pjt@xxxxxxxxxx>
>> Date: Mon, 18 Jul 2011 16:08:10 -0700
>> Jan H. Schönherr found that in the case of an on_list ancestor we may
>> incorrectly place the child to the right of a great-ancestor on the list.
>> Consider:
>>       A
>>      / \     Here, t1A results in A->cfs_rq being on_list, however when
>>     B  t1A   we start enqueuing from C this will not be visible.  This is
>>    /         compounded by the fact that on_list expiration may also be out
>>   C          of order, punching holes in the tree.
>>  /
>> t1C
>> Prevent this by making additions to the leaf_cfs_rq_list position independent.
>> This is done by maintaining additions to this list within the
>> enqueue_task_fair() path, which allows us to always enqueue against the
>> current entity's first on_list ancestor.
> The problem I have with this is that it makes the enqueue more
> expensive. We're now optimizing a relatively slow path (load-balance) at
> the cost of the hottest path in the kernel (enqueue/dequeue).

So this is a concern that I kept in mind. I don't think this should
actually add any measurable overhead:

- on_rq always implies on_list so we are guaranteed never to go past
where we have to go for enqueuing; moreover we're touching the same
lines that enqueue will use for continuing it s walk
- it takes on_list ~40ms+ to expire so the extra walk above can only
happen on this order.

I can't think of any benchmark that's going to measure anything for
this. pipe-bench et al are just going to always hit the single
on_list branch which should be the same cost as the previous == 1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at