Re: [PATCH 0/4] sched/fair: Manage lag and run to parity with different slices

From: Peter Zijlstra
Date: Mon Jun 23 2025 - 07:17:00 EST


On Fri, Jun 20, 2025 at 12:29:27PM +0200, Vincent Guittot wrote:

> yes but at this point any waking up task is either the next running
> task or enqueued in the rb tree

The scenario I was thinking of was something like:

A (long slice)
B (short slice)
C (short slice)

A wakes up and goes running

Since A is the only task around, it gets normal protection

B wakes up and doesn't win

So now we have A running with long protection and short task on-rq

C wakes up ...

Whereas what we would've wanted to end up with for C is A running with
short protection.

> > Which is why I approached it by moving the protection to after pick;
> > because then we can directly compare the task we're running to the
> > best pick -- which includes the tasks that got woken. This gives
> > check_preempt_wakeup_fair() better chances.
>
> we don't always want to break the run to parity but only when a task
> wakes up and should preempt current or decrease the run to parity
> period. Otherwise, the protection applies for a duration that is short
> enough to stay fair for others
>
> I will see if check_preempt_wakeup_fair can be smarter when deciding
> to cancel the protection

Thanks. In the above scenario B getting selected when C wakes up would
be a clue I suppose :-)

> > To be fair, I did not get around to testing the patches much beyond
> > booting them, so quite possibly they're buggered :-/
> >
> > > Also, my patchset take into account the NO_RUN_TO_PARITY case by
> > > adding a notion of quantum execution time which was missing until now
> >
> > Right; not ideal, but I suppose for the people that disable
> > RUN_TO_PARITY it might make sense. But perhaps there should be a little
> > more justification for why we bother tweaking a non-default option.
>
> Otherwise disabling RUN_TO_PARITY to check if it's the root cause of a
> regression or a problem becomes pointless because the behavior without
> the feature is wrong.

Fair enough.

> And some might not want to run to parity but behave closer to the
> white paper with a pick after each quantum with quantum being
> something in the range [0.7ms:2*tick)
>
> >
> > The problem with usage of normalized_sysctl_ values is that you then get
> > behavioural differences between 1 and 8 CPUs or so. Also, perhaps its
>
> normalized_sysctl_ values don't scale with the number of CPUs. In this
> case, it's always 0.7ms which is short enough compare to 1ms tick
> period to prevent default irq accounting to keep current for another
> tick

Right; but it not scaling means it is the full slice on UP, half the
slice on SMP-4 and a third for SMP-8 and up or somesuch.

It probably doesn't matter much, but its weird.