Re: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the current sched domain

From: Ricardo Neri
Date: Tue Feb 14 2023 - 01:33:58 EST


On Mon, Feb 13, 2023 at 01:17:09PM +0100, Dietmar Eggemann wrote:
> On 10/02/2023 19:31, Ricardo Neri wrote:
> > On Fri, Feb 10, 2023 at 05:12:30PM +0000, Valentin Schneider wrote:
> >> On 10/02/23 17:53, Peter Zijlstra wrote:
> >>> On Fri, Feb 10, 2023 at 02:54:56PM +0000, Valentin Schneider wrote:
> >>>
> >>>> So something like have SD_PREFER_SIBLING affect the SD it's on (and not
> >>>> its parent), but remove it from the lowest non-degenerated topology level?
> >>>
> >>> So I was rather confused about the whole moving it between levels things
> >>> this morning -- conceptually, prefer siblings says you want to try
> >>> sibling domains before filling up your current domain. Now, balancing
> >>> between siblings happens one level up, hence looking at child->flags
> >>> makes perfect sense.
> >>>
> >>> But looking at the current domain and still calling it prefer sibling
> >>> makes absolutely no sense what so ever.
> >>>
> >>
> >> True :-)
> >>
> >>> In that confusion I think I also got the polarity wrong, I thought you
> >>> wanted to kill prefer_sibling for the assymetric SMT cases, instead you
> >>> want to force enable it as long as there is one SMT child around.
> >
> > Exactly.
> >
> >>>
> >>> Whichever way around it we do it, I'm thinking perhaps some renaming
> >>> might be in order to clarify things.
> >>>
> >>> How about adding a flag SD_SPREAD_TASKS, which is the effective toggle
> >>> of the behaviour, but have it be set by children with SD_PREFER_SIBLING
> >>> or something.
> >>>
> >>
> >> Or entirely bin SD_PREFER_SIBLING and stick with SD_SPREAD_TASKS, but yeah
> >> something along those lines.
> >
> > I sense a consesus towards SD_SPREAD_TASKS.
>
> Can you not detect the E-core dst_cpu case on MC with:
>
> + if (child)
> + sds->prefer_sibling = child->flags & SD_PREFER_SIBLING;
> + else if (sds->busiest)
> + sds->prefer_sibling = sds->busiest->group_weight > 1;

Whose child wants the prefer_sibling setting? In update_sd_lb_stats(), it
is set based on the flags of the destination CPU's sched domain. But when
used in find_busiest_group() tasks are spread from the busiest group's
child domain.

Your proposed code, also needs a check for SD_PREFER_SIBLING, no?

Thanks and BR,
Ricardo