Re: [PATCH v2] sched/eevdf: Prevent vlag from going out of bounds when reweight_eevdf

From: Peter Zijlstra
Date: Tue Apr 23 2024 - 08:02:34 EST


On Tue, Apr 23, 2024 at 11:05:20AM +0800, Xuewen Yan wrote:
> On Mon, Apr 22, 2024 at 11:59 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Apr 22, 2024 at 09:12:12PM +0800, Xuewen Yan wrote:
> >
> > > By adding a log to observe weight changes in reweight_entity, I found
> > > that calc_group_shares() often causes new_weight to become very small:
> >
> > Yes, cgroups do that. But over-all that should not matter no?
> >
> > Specifically, the whole re-weight thing turns into a series like:
> >
> > w_0 w_1 w_n-1 w_0
> > S = --- * --- * ... * ----- = ---
> > w_1 w_2 w_n w_n
> >
> > Where S is our ultimate scale factor.
> >
> > So even if w_m (0 < m < n) is 2, it completely disappears. But yes, it
> > will create a big term, which is why the initial vlag should be limited.
>
> Okay, I understand what you mean. Even if the weight during dequeue is
> very small, the weight will be eliminated during enqueue.
> In this case, the necessity of the !on_rq case does not seem to be
> very important.
>
> On the other hand, the following case:
> place_entity()
> {
> ...
> 5244 load = cfs_rq->avg_load;
> 5245 if (curr && curr->on_rq)
> 5246 load += scale_load_down(curr->load.weight);
> 5247
> 5248 lag *= load + scale_load_down(se->load.weight);
> 5249 if (WARN_ON_ONCE(!load))
> 5250 load = 1;
> 5251 lag = div_s64(lag, load);<<<<
> ...
> }

So this plays games with scale_load_down() because this is W, the sum of
all w, which can indeed grow quite large and cause overflow.

> reweight_eevdf()
> {
> ...
> if (avruntime != se->vruntime) {
> 3770 vlag = entity_lag(avruntime, se);
> 3771 vlag = div_s64(vlag * old_weight, weight); <<<<
> 3772 se->vruntime = avruntime - vlag;
> 3773 }
> .....
> }

While here we're talking about a single w, which is much more limited in
scope. And per the above, what we're trying to do is:

vlag = lag/w
lag/w * w/w' = lag/w'

That is, move vlag from one w to another.

> There is no need to clamp the above two positions because these two
> calculations will not theoretically cause s64 overflow?

Well, supposedly, if I didn't get it wrong etc.. (I do tend to get
things wrong from time to time :-).

I would think limited vlag would stay below 1 second or about 30 bits
this leaves another 30 bits for w which *should* be enough.

Anyway, if you're unsure, sprinkle some check_mul_overflow() and see if
you can tickle it.