Re: weakness of runnable load tracking?

From: Alex Shi
Date: Sat Dec 08 2012 - 07:20:32 EST


On 12/06/2012 11:10 PM, Alex Shi wrote:
>
>>> Hi Paul & Ingo:
>>>
>>> In a short word of this issue: burst forking/waking tasks have no time
>>> accumulate the load contribute, their runnable load are taken as zero.
>>> that make select_task_rq do a wrong decision on which group is idlest.
>>
>> So these aren't strictly comparable; bursting and forking tasks have
>> fairly different characteristics here.
>
> Many thanks for looking into this. :)
>>
>> When we fork a task we intentionally reset the previous history. This
>> means that a forked task that immediately runs is going to show up as
>> 100% runnable and then converge to it's true value. This was fairly
>> intentionally chosen so that tasks would "start" fast rather than
>> having to worry about ramp up.
>
> I am sorry for didn't see the 100% runnable for a new forked task.
> I believe the code need the following patch to initialize decay_count,
> and load_avg_contrib. otherwise they are random value.
> In enqueue_entity_load_avg() p->se.avg.runnable_avg_sum for new forked
> task is always zero, either because se.avg.last_runnable_update is set
> as clock_task due to decay_count <=0, or just do
> __synchronize_entity_decay not update_entity_load_avg.

Paul:
Would you like to give some comments for the following patches?

>
> ===========
> From a161000dbece6e95bf3b81e9246d51784589d393 Mon Sep 17 00:00:00 2001
> From: Alex Shi <alex.shi@xxxxxxxxx>
> Date: Mon, 3 Dec 2012 17:30:39 +0800
> Subject: [PATCH 05/12] sched: load tracking bug fix
>
> We need initialize the se.avg.{decay_count, load_avg_contrib} to zero
> after a new task forked.
> Otherwise random values of above variable give a incorrect statistic
> data when do new task enqueue:
> enqueue_task_fair
> enqueue_entity
> enqueue_entity_load_avg
>
> Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx>
> ---
> kernel/sched/core.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5dae0d2..e6533e1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1534,6 +1534,8 @@ static void __sched_fork(struct task_struct *p)
> #if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
> p->se.avg.runnable_avg_period = 0;
> p->se.avg.runnable_avg_sum = 0;
> + p->se.avg.decay_count = 0;
> + p->se.avg.load_avg_contrib = 0;
> #endif
> #ifdef CONFIG_SCHEDSTATS
> memset(&p->se.statistics, 0, sizeof(p->se.statistics));
>


--
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/