Re: weakness of runnable load tracking?

From: Alex Shi
Date: Wed Dec 05 2012 - 22:15:17 EST


On 12/05/2012 11:19 PM, Alex Shi wrote:
> Hi Paul&Ingo:
>
> Runnable load tracking patch set introduce a good way to tracking each
> entity/rq's running time.
> But when I try to enable it in load balance, I found burst forking
> many new tasks will make just few cpu heavy while other cpu has no
> much task assigned. That is due to the new forked task's
> load_avg_contrib is zero after just created. then no matter how many
> tasks assigned to a CPU can not increase the cfs_rq->runnable_load_avg
> or rq->avg.load_avg_contrib if this cpu idle.
> Actually, if just for new task issue, we can set new task's initial
> load_avg same as load_weight. but if we want to burst wake up many
> long time sleeping tasks, it has the same issue here since their were
> decayed to zero. So what solution I can thought is recording the se's
> load_avg_contrib just before dequeue, and don't decay the value, when
> it was waken up, add this value to new cfs_rq. but if so, the runnable
> load tracking is total meaningless.
> So do you have some idea of burst wakeup balancing with runnable load tracking?

Hi Paul & Ingo:

In a short word of this issue: burst forking/waking tasks have no time
accumulate the load contribute, their runnable load are taken as zero.
that make select_task_rq do a wrong decision on which group is idlest.

There is still 3 kinds of solution is helpful for this issue.

a, set a unzero minimum value for the long time sleeping task. but it
seems unfair for other tasks these just sleep a short while.

b, just use runnable load contrib in load balance. Still using
nr_running to judge idlest group in select_task_rq_fair. but that may
cause a bit more migrations in future load balance.

c, consider both runnable load and nr_running in the group: like in the
searching domain, the nr_running number increased a certain number, like
double of the domain span, in a certain time. we will think it's a burst
forking/waking happened, then just count the nr_running as the idlest
group criteria.

IMHO, I like the 3rd one a bit more. as to the certain time to judge if
a burst happened, since we will calculate the runnable avg at very tick,
so if increased nr_running is beyond sd->span_weight in 2 ticks, means
burst happening. What's your opinion of this?

Any comments are appreciated!

Regards!
Alex
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/