Re: [PATCH 3/4] sched: drop group_capacity to 1 only if local grouphas extra capacity

From: Nikhil Rao
Date: Fri Oct 15 2010 - 13:28:17 EST


On Fri, Oct 15, 2010 at 10:05 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, 2010-10-15 at 09:13 -0700, Nikhil Rao wrote:
>
>> >> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> >> index 0dd1021..da0c688 100644
>> >> --- a/kernel/sched_fair.c
>> >> +++ b/kernel/sched_fair.c
>> >> @@ -2030,6 +2030,7 @@ struct sd_lb_stats {
>> >> Â Â Â unsigned long this_load;
>> >> Â Â Â unsigned long this_load_per_task;
>> >> Â Â Â unsigned long this_nr_running;
>> >> + Â Â unsigned long this_group_capacity;
>> >>
>> >> Â Â Â /* Statistics of the busiest group */
>> >> Â Â Â unsigned long max_load;
>> >> @@ -2546,15 +2547,18 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
>> >> Â Â Â Â Â Â Â /*
>> >> Â Â Â Â Â Â Â Â* In case the child domain prefers tasks go to siblings
>> >> Â Â Â Â Â Â Â Â* first, lower the sg capacity to one so that we'll try
>> >> - Â Â Â Â Â Â Â* and move all the excess tasks away.
>> >> + Â Â Â Â Â Â Â* and move all the excess tasks away. We lower capacity only
>> >> + Â Â Â Â Â Â Â* if the local group can handle the extra capacity.
>> >> Â Â Â Â Â Â Â Â*/
>> >> - Â Â Â Â Â Â if (prefer_sibling)
>> >> + Â Â Â Â Â Â if (prefer_sibling && !local_group &&
>> >> + Â Â Â Â Â Â Â Â sds->this_nr_running < sds->this_group_capacity)
>> >> Â Â Â Â Â Â Â Â Â Â Â sgs.group_capacity = min(sgs.group_capacity, 1UL);
>> >>
>> >> Â Â Â Â Â Â Â if (local_group) {
>> >> Â Â Â Â Â Â Â Â Â Â Â sds->this_load = sgs.avg_load;
>> >> Â Â Â Â Â Â Â Â Â Â Â sds->this = sg;
>> >> Â Â Â Â Â Â Â Â Â Â Â sds->this_nr_running = sgs.sum_nr_running;
>> >> + Â Â Â Â Â Â Â Â Â Â sds->this_group_capacity = sgs.group_capacity;
>> >> Â Â Â Â Â Â Â Â Â Â Â sds->this_load_per_task = sgs.sum_weighted_load;
>> >> Â Â Â Â Â Â Â } else if (update_sd_pick_busiest(sd, sds, sg, &sgs, this_cpu)) {
>> >> Â Â Â Â Â Â Â Â Â Â Â sds->max_load = sgs.avg_load;
>
> OK, but then you assume that local_group will always be the first group
> served, nor is there any purpose for adding sds->this_group_capacity,
> you could keep that local to this function.
>

Yes, this patch makes the assumption that local_group is the first.

About this_group_capacity, yes -- we don't need the additional field in
sg_lb_stats. We can make it local to this function. I just realized that if
we re-order the patches, we can reuse sgs.has_capacity from the next patch.

> For regular balancing local_group will be the first, since we only
> ascend the domain tree on the local groups. But its not true for no_hz
> balancing afaikt.
>

As Suresh points out, even with NOHZ, the local_group is the first group
since we ascend the per-cpu sched domain. I can add this into the comments
to make it clear.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/