Re: [PATCH v1] sched/uclamp: Skip uclamp_rq_dec() for non-final dequeue of delayed tasks

From: Zihuan Zhang
Date: Tue Jul 01 2025 - 20:49:04 EST


Hi Xuewen,

在 2025/7/1 18:49, Xuewen Yan 写道:
Hi zihuan,

On Tue, Jul 1, 2025 at 5:34 PM Zihuan Zhang <zhangzihuan@xxxxxxxxxx> wrote:
Currently, uclamp_rq_inc() skips updating the clamp aggregation for
delayed tasks unless ENQUEUE_DELAYED is set, to ensure we only track the
real enqueue of a task that was previously marked as sched_delayed.

However, the corresponding uclamp_rq_dec() path only checks
sched_delayed, and misses the DEQUEUE_DELAYED flag. As a result, we may
skip dec for a delayed task that is now being truly dequeued, leading to
uclamp aggregation mismatch.

This patch makes uclamp_rq_dec() consistent with uclamp_rq_inc() by
checking both sched_delayed and DEQUEUE_DELAYED, ensuring correct
accounting symmetry.

Fixes: 90ca9410dab2 ("sched/uclamp: Align uclamp and util_est and call before freq update")
Signed-off-by: Zihuan Zhang <zhangzihuan@xxxxxxxxxx>
---
kernel/sched/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a3..99f1542cff7d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1781,7 +1781,7 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE;
}

-static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
+static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags)
{
enum uclamp_id clamp_id;

@@ -1797,7 +1797,8 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
if (unlikely(!p->sched_class->uclamp_enabled))
return;

- if (p->se.sched_delayed)
+ /* Skip dec if this is a delayed task not being truly dequeued */
+ if (p->se.sched_delayed && !(flags & DEQUEUE_DELAYED))
return;
Consider the __sched_setscheduler(),when changing the delayed-task's
sched-class,your patch would dec the uclamp twice.

Thanks for your clarification. I’ve understood that the kernel unconditionally decrements the uclamp counter when dequeuing a task,

regardless of whether it’s currently enqueued. That makes sense.

Appreciate your time and explanation!
BR
---
xuewen

for_each_clamp_id(clamp_id)
@@ -2039,7 +2040,7 @@ static void __init init_uclamp(void)

#else /* !CONFIG_UCLAMP_TASK */
static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
-static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
+static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags) { }
static inline void uclamp_fork(struct task_struct *p) { }
static inline void uclamp_post_fork(struct task_struct *p) { }
static inline void init_uclamp(void) { }
@@ -2112,7 +2113,7 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
* Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
* and mark the task ->sched_delayed.
*/
- uclamp_rq_dec(rq, p);
+ uclamp_rq_dec(rq, p, flags);
return p->sched_class->dequeue_task(rq, p, flags);
}

--
2.25.1

Best regards,
Zihuan