Re: [Linux 5.18-rc1] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:3355 update_blocked_averages

From: Dietmar Eggemann
Date: Tue Apr 05 2022 - 17:10:49 EST


On 04/04/2022 08:19, Ammar Faizi wrote:
>
> Hello scheduler maintainers,
>
> I got the following warning in Linux 5.18-rc1, I don't have the
> reproducer yet,
> it happens randomly. Please shed some light.

Tried to recreate the issue but no success so far. I used you config
file, clang-14 and a Xeon CPU E5-2690 v2 (2 sockets 40 CPUs) with 20
two-level cgoupv1 taskgroups '/X/Y' with 'hackbench (10 groups, 40 fds)
+ idling' running in all '/X/Y/'.

What userspace are you running?

There seemed to be some pressure on your machine when it happened?

> <6>[13420.623334][ C7] perf: interrupt took too long (2530 > 2500),
> lowering kernel.perf_event_max_sample_rate to 78900

Maybe you could split the SCHED_WARN_ON so we know which signal causes this?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..0d45e09e5bfc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3350,9 +3350,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq
*cfs_rq)
* Make sure that rounding and/or propagation of PELT values never
* break this.
*/
- SCHED_WARN_ON(cfs_rq->avg.load_avg ||
- cfs_rq->avg.util_avg ||
- cfs_rq->avg.runnable_avg);
+ SCHED_WARN_ON(cfs_rq->avg.load_avg);
+ SCHED_WARN_ON(cfs_rq->avg.util_avg);
+ SCHED_WARN_ON(cfs_rq->avg.runnable_avg);

return true;
}

[...]