Re: [PATCH] sched/fair: use signed long when compute energy delta in eas

From: Xuewen Yan
Date: Thu Apr 08 2021 - 22:22:08 EST


Hi
>
> Hi,
> > Hi
> >
> > On Wed, Apr 7, 2021 at 10:11 PM Pierre <pierre.gondois@xxxxxxx> wrote:
> > >
> > > Hi,
> > > > I test the patch, but the overflow still exists.
> > > > In the "sched/fair: Use pd_cache to speed up
> > find_energy_efficient_cpu()"
> > > > I wonder why recompute the cpu util when cpu==dst_cpu in
> > compute_energy(),
> > > > when the dst_cpu's util change, it also would cause the overflow.
> > >
> > > The patches aim to cache the energy values for the CPUs whose
> > > utilization is not modified (so we don't have to compute it multiple
> > > times). The values cached are the 'base values' of the CPUs, i.e. when
> > > the task is not placed on the CPU. When (cpu==dst_cpu) in
> > > compute_energy(), it means the energy values need to be updated instead
> > > of using the cached ones.
> > >
> > well, is it better to use the task_util(p) + cache values ? but in
> > this case, the cache
> > values may need more parameters.
>
> This patch-set is not significantly improving the execution time of
> feec(). The results we have so far are an improvement of 5-10% in
> execution time, with feec() being executed in < 10us. So the gain is not
> spectacular.

well, I meaned to cache all util value and compute energy with caches, when
(cpu==dst_cpu), use caches instead of updating util, and do not get
util with function:
"effective_cpu_util()", to compute util with cache.
I add more parameters into pd_cache:
struct pd_cache {
unsigned long util;
unsigned long util_est;
unsigned long util_cfs;
unsigned long util_irq;
unsigned long util_rt;
unsigned long util_dl;
unsigned long bw_dl;
unsigned long freq_util;
unsigned long nrg_util;
};
In this way, it can avoid util update while feec. I tested with it,
and the negative delta disappeared.
Maybe this is not a good method, but it does work.
>
> >
> > > You are right, there is still a possibility to have a negative delta
> > > with the patches at:
> > >
> > https://gitlab.arm.com/linux-arm/linux-power/-/commits/eas/next/integration-20210129
> > <https://gitlab.arm.com/linux-arm/linux-power/-/commits/eas/next/integration-20210129>
> > > Adding a check before subtracting the values, and bailing out in such
> > > case would avoid this, such as at:
> > > https://gitlab.arm.com/linux-arm/linux-pg/-/commits/feec_bail_out/
> > <https://gitlab.arm.com/linux-arm/linux-pg/-/commits/feec_bail_out/>
> > >
> > In your patch, you bail out the case by "go to fail", that means you
> > don't use eas in such
> > case. However, in the actual scene, the case often occurr when select
> > cpu for small task.
> > As a result, the small task would not select cpu according to the eas,
> > it may affect
> > power consumption?
> With this patch (bailing out), the percentage of feec() returning due to
> a negative delta I get are:
> on a Juno-r2, with 2 big CPUs and 4 CPUs (capacity of 383), with a
> workload running during 5s with task having a period of 16 ms and:
> - 50 tasks at 1%: 0.14%
> - 30 tasks at 1%: 0.54%
> - 10 tasks at 1%: < 0.1%
> - 30 tasks at 5%: < 0.1%
> - 10 tasks at 5%: < 0.1%
> It doesn't happen so often to me.If we bail out of feec(), the task will
> still have another opportunity in the next call. However I agree this
> can lead to a bad placement when this happens.
> >
> > > I think a similar modification should be done in your patch. Even though
> > > this is a good idea to group the calls to compute_energy() to reduce the
> > > chances of having updates of utilization values in between the
> > > compute_energy() calls,
> > > there is still a chance to have updates. I think it happened when I
> > > applied your patch.
> > >
> > > About changing the delta(s) from 'unsigned long' to 'long', I am not
> > > sure of the meaning of having a negative delta. I thing it would be
> > > better to check and fail before it happens instead.
> > >
> > > Regards
> > >
>
>
>