Re: [PATCH 0/4] sched/fair: Manage lag and run to parity with different slices
From: Peter Zijlstra
Date: Fri Jun 20 2025 - 04:44:30 EST
On Thu, Jun 19, 2025 at 02:27:43PM +0200, Vincent Guittot wrote:
> On Wed, 18 Jun 2025 at 09:03, Vincent Guittot
> <vincent.guittot@xxxxxxxxxx> wrote:
> >
> > On Tue, 17 Jun 2025 at 11:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Jun 13, 2025 at 04:05:10PM +0200, Vincent Guittot wrote:
> > > > Vincent Guittot (3):
> > > > sched/fair: Use protect_slice() instead of direct comparison
> > > > sched/fair: Limit run to parity to the min slice of enqueued entities
> > > > sched/fair: Improve NO_RUN_TO_PARITY
> > >
> > > Ah. I wrote these here patches and then totally forgot about them :/.
> > > They take a different approach.
> > >
> > > The approach I took was to move decision to stick with curr after pick,
> > > instead of before it. That way we can evaluate the tree at the time of
> > > preemption.
> >
> > Let me have a look at your patches
>
> I have looked and tested your patches but they don't solve the lag and
> run to parity issues not sur what he's going wrong.
Humm.. So what you do in patch 3, setting the protection to min_slice
instead of the deadline, that only takes into account the tasks present
at the point we schedule.
Which is why I approached it by moving the protection to after pick;
because then we can directly compare the task we're running to the
best pick -- which includes the tasks that got woken. This gives
check_preempt_wakeup_fair() better chances.
To be fair, I did not get around to testing the patches much beyond
booting them, so quite possibly they're buggered :-/
> Also, my patchset take into account the NO_RUN_TO_PARITY case by
> adding a notion of quantum execution time which was missing until now
Right; not ideal, but I suppose for the people that disable
RUN_TO_PARITY it might make sense. But perhaps there should be a little
more justification for why we bother tweaking a non-default option.
The problem with usage of normalized_sysctl_ values is that you then get
behavioural differences between 1 and 8 CPUs or so. Also, perhaps its
time to just nuke that whole scaling thing (I'm sure someone mentioned
that a short while ago).
> Regarding the "fix delayed requeue", I already get an update of
> current before requeueing a delayed task. Do you have a use case in
> mind ?
Ah, it was just from reading code, clearly I missed something. Happy to
forget about that patch :-)