RE: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the current sched domain

From: Chen, Tim C
Date: Thu Feb 09 2023 - 15:00:34 EST


>> static inline void update_sd_lb_stats(struct lb_env *env, struct
>> sd_lb_stats *sds) {
>> - struct sched_domain *child = env->sd->child;
>> struct sched_group *sg = env->sd->groups;
>> struct sg_lb_stats *local = &sds->local_stat;
>> struct sg_lb_stats tmp_sgs;
>> @@ -10045,9 +10044,11 @@ static inline void update_sd_lb_stats(struct
>lb_env *env, struct sd_lb_stats *sd
>> sg = sg->next;
>> } while (sg != env->sd->groups);
>>
>> - /* Tag domain that child domain prefers tasks go to siblings first */
>> - sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
>> -
>> + /*
>> + * Tag domain that @env::sd prefers to spread excess tasks among
>> + * sibling sched groups.
>> + */
>> + sds->prefer_sibling = env->sd->flags & SD_PREFER_SIBLING;
>>
>This does help fix the issue that non-SMT core fails to pull task from busy SMT-
>cores.
>And it also semantically changes the definination of prefer sibling. Do we also
>need to change this:
> if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> sd->child->flags &= ~SD_PREFER_SIBLING; might be:
> if ((sd->flags & SD_ASYM_CPUCAPACITY))
> sd->flags &= ~SD_PREFER_SIBLING;
>

Yu,

I think you are talking about the code in sd_init()
where SD_PREFER_SIBLING is first set
to "ON" and updated depending on SD_ASYM_CPUCAPACITY. The intention of the code
is if there are cpus in the scheduler domain that have differing cpu capacities,
we do not want to do spreading among the child groups in the sched domain.
So the flag is turned off in the child group level and not the parent level. But with your above
change, the parent's flag is turned off, leaving the child level flag on.
This moves the level where spreading happens (SD_PREFER_SIBLING on)
up one level which is undesired (see table below).

SD_PREFER_SIBLING after init
original code proposed
SD Level SD_ASYM_CPUCAPACITY
root ON ON OFF (note: SD_PREFER_SIBLING unused at this level)
first level ON OFF OFF
second level OFF OFF ON
third level OFF ON ON

Tim