Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
From: Peter Zijlstra
Date: Tue Apr 28 2026 - 13:37:16 EST
On Tue, Apr 28, 2026 at 09:47:11PM +0530, K Prateek Nayak wrote:
> (+ scheduler folks)
>
> Hello Zhan,
>
> On 4/28/2026 8:19 PM, Zhan Xusheng wrote:
> > After commit 556146ce5e94 ("sched/fair: Avoid overflow in
> > enqueue_entity()"), place_entity() can shift cfs_rq->zero_vruntime
> > towards a newly enqueued heavy entity. This can make (vruntime -
> > zero_vruntime) very large for other entities and cause key * load in
> > vruntime_eligible() to overflow s64, flipping the eligibility result.
>
> So the commit in question moves the zero_vruntime only when the
> load > sum_weight.
>
> You seem to have found a case where the entity_key() is already large
> enough that moving the zero_vruntime farther will make the eligibility
> check overflow which we were hoping will not be the case.
>
> Do you have a reproducer that fails pick_eevdf() after introduction of
> commit 556146ce5e94? Also, do you see any splats in the dmesg since we
> have a defensive WARN_ON() to catch an overflow.
Right, either a reproducer or a trace showing the values leading up to
this are the absolute minimum you need to provide. That is, you need a
definite description of how you got there, otherwise you cannot judge
the solution is sound.
Anyway... let me construct a worst case.
So if you have this cgroup crap on, then you can have an entity of
weight 2, and vlag should then be bounded by: (slice+TICK_NSEC) *
NICE_0_LOAD, which is around 44 bits as per the comment on entity_key().
The other end is 100*NICE_0_LOAD, so lets wake that, then you get:
{key, weight}[] := {
puny: { (slice + TICK_NSEC) * NICE_0_LOAD, 2 },
max: { 0, 100*NICE_0_LOAD },
}
The avg_vruntime() would end up being very close to 0 (which is
zero_vruntime), so no real help making that more accurate.
vruntime_eligible(puny) ends up with:
avg = 2 * puny.key (+ 0)
weight = 2 + 100 * NICE_0_LOAD
avg >= puny.key * weight
And that is: (slice + TICK_NSEC) * NICE_0_LOAD * NICE_0_LOAD * 100
and yes, I suppose that can exceed 64bit :-(
Can someone double check that? I have a silly head-ache and could easily
have made mistake. Not sure what the best solution is either. I think we
can show avg cannot overflow, and if this multiplication overflows, then
it must be larger, so perhaps the proposed patch is the best option.
Dunno, need clear head :/