Re: [PATCH v8 2/7] sched/fair: Decay task PELT values during wakeup migration

From: Dietmar Eggemann
Date: Thu May 05 2022 - 14:45:25 EST


+ vdonnefort@xxxxxxxxx
- vincent.donnefort@xxxxxxx

On 29/04/2022 16:11, Vincent Donnefort wrote:

[...]

> To that end, we need sched_clock_cpu() but it is a costly function. Limit
> the usage to the case where the source CPU is idle as we know this is when
> the clock is having the biggest risk of being outdated. In this such case,
> let's call it cfs_idle_lag the delta time between the rq_clock_pelt value
> at rq idle and cfs_rq idle. And rq_idle_lag the delta between "now" and
> the rq_clock_pelt at rq idle.
>
> The estimated PELT clock is then:

Where is this nice diagram (timeline) from v7 ?

[...]

> + /*
> + * estimated "now" is:
> + * last_update_time + (the cfs_rq's last_update_time)
> + * cfs_idle_lag + (delta between cfs_rq's update and rq's update)
> + * rq_idle_lag (delta between rq's update and now)

The parentheses here make this hard to get. Same text as in the patch
header. What would have speed up me understanding this would have been
this line:

now = last_update_time + cfs_idle_lag + rq_idle_lag

> + *
> + * last_update_time = cfs_rq_clock_pelt()
> + * = rq_clock_pelt() - cfs->throttled_clock_pelt_time
> + *
> + * cfs_idle_lag = rq_clock_pelt()@rq_idle -
> + * rq_clock_pelt()@cfs_rq_idle
> + *
> + * rq_idle_lag = sched_clock_cpu() - rq_clock()@rq_idle
> + *
> + * The rq_clock_pelt() from last_update_time being the same as
> + * rq_clock_pelt()@cfs_rq_idle, we can write:
> + *
> + * now = rq_clock_pelt()@rq_idle - cfs->throttled_clock_pelt_time +
> + * sched_clock_cpu() - rq_clock()@rq_idle
> + *
> + * Where:
> + * rq_clock_pelt()@rq_idle is rq->clock_pelt_idle
> + * rq_clock()@rq_idle is rq->enter_idle
> + * cfs->throttled_clock_pelt_time is cfs_rq->throttled_pelt_idle
> + */

[...]

> +#ifdef CONFIG_CFS_BANDWIDTH
> +static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
> {
> - lockdep_assert_rq_held(rq);
> - assert_clock_updated(rq);
> -
> - return rq->clock_pelt - rq->lost_idle_time;
> + /*
> + * Make sure that pending update of rq->clock_pelt_idle and
> + * rq->enter_idle are visible during update_blocked_average() before

s/update_blocked_average()/update_blocked_averages()

> + * updating cfs_rq->throttled_pelt_idle.
> + */
> + smp_wmb();

I don't understand this one. Where is the counterpart barrier?

> + if (unlikely(cfs_rq->throttle_count))
> + u64_u32_store(cfs_rq->throttled_pelt_idle, U64_MAX);
> + else
> + u64_u32_store(cfs_rq->throttled_pelt_idle,
> + cfs_rq->throttled_clock_pelt_time);

Nit-pick:

Using a local u64 throttled might be easier on the eye:

if (unlikely(cfs_rq->throttle_count))
- u64_u32_store(cfs_rq->throttled_pelt_idle, U64_MAX);
+ throttled = U64_MAX;
else
- u64_u32_store(cfs_rq->throttled_pelt_idle,
- cfs_rq->throttled_clock_pelt_time);
+ throttled = cfs_rq->throttled_clock_pelt_time;
+
+ u64_u32_store(cfs_rq->throttled_pelt_idle, throttled);

[...]