Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ

From: Joel Fernandes
Date: Fri Jan 22 2021 - 14:38:19 EST

Next message: Dmitry Osipenko: "[PATCH v1 1/3] gpio: tegra: Use debugfs_create_devm_seqfile()"
Previous message: Bjorn Andersson: "Re: [PATCH v2 4/4] arm: dts: add 8devices Habanero DVK"
In reply to: Qais Yousef: "Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 22, 2021 at 06:39:27PM +0000, Qais Yousef wrote:
> On 01/22/21 17:56, Vincent Guittot wrote:
> > > ---
> > > kernel/sched/fair.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 04a3ce20da67..fe2dc0024db5 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8381,7 +8381,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
> > > if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
> > > return false;
> > >
> > > - if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
> > > + if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick + (HZ/20)))
> >
> > This condition is there to make sure to update blocked load at most
> > once a tick in order to filter newly idle case otherwise the rate
> > limit is already done by load balance interval
> > This hard coded (HZ/20) looks really like an ugly hack
>
> This was meant as an RFC patch to discuss the problem really.

Agreed, sorry.

> Joel is seeing update_blocked_averages() taking ~100us. Half of it seems in
> processing __update_blocked_fair() and the other half in sugov_update_shared().
> So roughly 50us each. Note that each function is calling an iterator in
> return. Correct me if my numbers are wrong Joel.

Correct, and I see update_nohz_stats() itself called around 8 times during a
load balance which multiplies the overhead.

Dietmar found out also that the reason for update_nohz_stacks() being called
8 times is because in our setup, there is only 1 MC sched domain with all 8
CPUs, versus say 2 MC domains with 4 CPUs each.

> Running on a little core on low frequency these numbers don't look too odd.
> So I'm not seeing how we can speed these functions up.

Agreed.

> But since update_sg_lb_stats() will end up with multiple calls to
> update_blocked_averages() in one go, this latency adds up quickly.

True!

> One noticeable factor in Joel's system is the presence of a lot of cgroups.
> Which is essentially what makes __update_blocked_fair() expensive, and it seems
> to always return something has decayed so we end up with a call to
> sugov_update_shared() in every call.

Correct.

thanks,

- Joel

[..]

Next message: Dmitry Osipenko: "[PATCH v1 1/3] gpio: tegra: Use debugfs_create_devm_seqfile()"
Previous message: Bjorn Andersson: "Re: [PATCH v2 4/4] arm: dts: add 8devices Habanero DVK"
In reply to: Qais Yousef: "Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]