Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking

From: Odin Ugedal
Date: Tue May 25 2021 - 06:37:03 EST


Hi,

tir. 25. mai 2021 kl. 11:58 skrev Vincent Guittot <vincent.guittot@xxxxxxxxxx>:
> Could you give more details about how cfs_rq->avg.load_avg = 4 but
> cfs_rq->avg.load_sum = 0 ?
>
> cfs_rq->avg.load_sum is decayed and can become null when crossing
> period which implies an update of cfs_rq->avg.load_avg. This means
> that your case is generated by something outside the pelt formula ...
> like maybe the propagation of load in the tree. If this is the case,
> we should find the error and fix it

Ahh, yeah, that could probably be described better.

It is (as far as I have found out) because the pelt divider is changed,
and the output from "get_pelt_divider(&cfs_rq->avg)" is changed, resulting
in a different value being removed than added.

Inside pelt itself, this cannot happen. When pelt changes the load_sum, it
recalculates the load_avg based on load_sum, and not the delta, afaik.

And as you say, the "issue" therefore (as I see it) outside of PELT. Due to
how the pelt divider is changed, I assume it is hard to pinpoint where the issue
is. I can try to find a clear path where where we can see what is added
and what is removed from both cfs_rq->avg.load_sum and cfs_rq->avg.load_avg,
to better be able to pinpoint what is happening.

Previously I thought this was a result of precision loss due to division and
multiplication during load add/remove inside fair.c, but I am not sure that
is the issue, or is it?

If my above line of thought makes sense, do you still view this as an error
outside PELT, or do you see another possible/better solution?

Will investigate further.

Thanks
Odin