Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

From: Qais Yousef
Date: Fri May 01 2020 - 07:03:07 EST


On 04/30/20 20:21, Dietmar Eggemann wrote:
> On 29/04/2020 14:30, Qais Yousef wrote:
> > Hi Pavan
> >
> > On 04/29/20 17:02, Pavan Kondeti wrote:
> >> Hi Qais,
> >>
> >> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
>
> [...]
>
> >>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >>> static inline struct uclamp_se
> >>> uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >>> {
> >>> - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> - struct uclamp_se uc_max = uclamp_default[clamp_id];
> >>> + struct uclamp_se uc_req, uc_max;
> >>> +
> >>> + /*
> >>> + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> >>> + */
> >>> + uclamp_sync_util_min_rt_default(p);
> >>> +
> >>> + uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> + uc_max = uclamp_default[clamp_id];
> >>
> >> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> >> clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> >
> > It was actually intentional to make sure we update the value ASAP. I didn't
> > think it's a lot of overhead. I can further protect with a check to verify
> > whether the value has changed if it seems heavy handed.
>
> Users of uclamp_eff_value()->uclamp_eff_get() ((like
> rt_task_fits_capacity())) always call both ids.
>
> So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would
> make sense. It's overhead in the fast path for rt tasks.
>
> Since changes to sched_util_clamp_min_rt_default will be fairly rare,
> you might even want to consider only doing the uclamp_se_set(...,
> min_rt_default, ...) in case
>
> uc_se->value != sysctl_sched_uclamp_util_min_rt_default

Okay will do though I'm not convinced this micro optimization makes any
difference. p->uclamp_req[] is already read, so it should be cache hot.
uclamp_set_se() performs 3 writes on it, so I expect no stalls. Even if it
was a write-through cache, there's usually a write buffer that I think should
be able to hide the 3 writes. Write-through + linux is a bad idea in general.

In contrary, the complex if condition might make branch prediction less
effective, hence this micro optimization might create a negative effect.

So I don't see this a clear win and it would be hard to know without proper
measurement really. It is cheaper sometimes to always execute something than
sprinkle more branches to avoid executing this simple code.

I just realized though that I didn't define the
uclamp_sync_util_min_rt_default() as inline, which I should to avoid a function
call. Compiler *should* be smart though :p

Thanks

--
Qais Yousef