Re: [PATCH V3] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization

From: shrikanth hegde
Date: Fri Feb 24 2023 - 01:28:20 EST




On 2/24/23 7:19 AM, Hillf Danton wrote:
> On Fri, 24 Feb 2023 00:29:18 +0530 Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
>> @@ -5923,6 +5923,10 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>> INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>> hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
>> cfs_b->period_timer.function = sched_cfs_period_timer;
>> +
>> + /* Add a random offset so that timers interleave */
>> + hrtimer_set_expires(&cfs_b->period_timer,
>> + get_random_u32_below(cfs_b->period));
>> hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> cfs_b->slack_timer.function = sched_cfs_slack_timer;
>> cfs_b->slack_started = false;
>> --
>> 2.31.1
>
> Could you specify what sense this makes, given hrtimer_forward_now() in
> start_cfs_bandwidth() and sched_cfs_period_timer(), which makes the
> timer expire after now? Why does the randomness at init time play a role
> at start time and run time?

Currently, Initial value is not set for period_timer. Expiry is calculated as
expiry = $INITIAL_EXPIRYVALUE + $N * $PERIOD

Hence, when there are two or more CPU cgroup's using bandwidth controller,
two period_timers would align at expiry.

Adding a random offset play a role only at the start time, and no impact on the
run time. By adding offset, the different period_timer interleave, and we would get
the benefit of SMT folding, less context switch's and less hypervisor preemptions.

More details are in RFC PATCH:
https://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@xxxxxxxxxxxxxxxxxx/