Re: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the current sched domain

From: Ricardo Neri
Date: Fri Feb 24 2023 - 07:19:10 EST


On Thu, Feb 23, 2023 at 11:09:55AM +0100, Dietmar Eggemann wrote:
> On 16/02/2023 06:21, Ricardo Neri wrote:
> > On Mon, Feb 13, 2023 at 10:43:28PM -0800, Ricardo Neri wrote:
> >> On Mon, Feb 13, 2023 at 01:17:09PM +0100, Dietmar Eggemann wrote:
> >>> On 10/02/2023 19:31, Ricardo Neri wrote:
> >>>> On Fri, Feb 10, 2023 at 05:12:30PM +0000, Valentin Schneider wrote:
> >>>>> On 10/02/23 17:53, Peter Zijlstra wrote:
> >>>>>> On Fri, Feb 10, 2023 at 02:54:56PM +0000, Valentin Schneider wrote:
>
> [...]
>
> >>> Can you not detect the E-core dst_cpu case on MC with:
> >>>
> >>> + if (child)
> >>> + sds->prefer_sibling = child->flags & SD_PREFER_SIBLING;
> >>> + else if (sds->busiest)
> >>> + sds->prefer_sibling = sds->busiest->group_weight > 1;
> >>
> >> Whose child wants the prefer_sibling setting? In update_sd_lb_stats(), it
> >> is set based on the flags of the destination CPU's sched domain. But when
> >> used in find_busiest_group() tasks are spread from the busiest group's
> >> child domain.
> >>
> >> Your proposed code, also needs a check for SD_PREFER_SIBLING, no?
> >
> > I tweaked the solution that Dietmar proposed:
> >
> > - sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
> > + if (sds->busiest)
> > + sds->prefer_sibling = sds->busiest->flags & SD_PREFER_SIBLING;
>
> Maybe:
>
> sds->prefer_sibling = !!(sds->busiest->flags & SD_PREFER_SIBLING);
>
> 1 vs 2048 ?

Sure, I can do this.
>
> > This comes from the observation that the prefer_sibling setting acts on
> > busiest group. It then depends on whether the busiest group, not the local
> > group, has child sched sched domains. Today it works because in most cases
> > both the local and the busiest groups have child domains with the SD_
> > PREFER_SIBLING flag.
> >
> > This would also satisfy sched domains with the SD_ASYM_CPUCAPACITY flag as
> > prefer_sibling would not be set in that case.
> >
> > It would also conserve the current behavior at the NUMA level. We would
> > not need to implement SD_SPREAD_TASKS.
> >
> > This would both fix the SMT vs non-SMT bug and be less invasive.
>
> Yeah, much better! I always forget that we have those flags on SGs now
> as well. Luckily, we just need to check busiest sg to cover all cases.

Right. I can add a comment to clarify from where the flags come.