Re: [PATCH v1] sched/uclamp: Skip uclamp_rq_dec() for non-final dequeue of delayed tasks

From: Zihuan Zhang
Date: Tue Jul 01 2025 - 20:53:50 EST


Hi Prateek,

在 2025/7/1 19:00, K Prateek Nayak 写道:
Hello Zihuan Zhang,

On 7/1/2025 3:04 PM, Zihuan Zhang wrote:
Currently, uclamp_rq_inc() skips updating the clamp aggregation for
delayed tasks unless ENQUEUE_DELAYED is set, to ensure we only track the
real enqueue of a task that was previously marked as sched_delayed.

However, the corresponding uclamp_rq_dec() path only checks
sched_delayed, and misses the DEQUEUE_DELAYED flag. As a result, we may
skip dec for a delayed task that is now being truly dequeued, leading to
uclamp aggregation mismatch.

This patch makes uclamp_rq_dec() consistent with uclamp_rq_inc() by
checking both sched_delayed and DEQUEUE_DELAYED, ensuring correct
accounting symmetry.

Fixes: 90ca9410dab2 ("sched/uclamp: Align uclamp and util_est and call before freq update")
Signed-off-by: Zihuan Zhang <zhangzihuan@xxxxxxxxxx>
---
  kernel/sched/core.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a3..99f1542cff7d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1781,7 +1781,7 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
          rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE;
  }
  -static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
+static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags)
  {
      enum uclamp_id clamp_id;
  @@ -1797,7 +1797,8 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
      if (unlikely(!p->sched_class->uclamp_enabled))
          return;
  -    if (p->se.sched_delayed)
+    /* Skip dec if this is a delayed task not being truly dequeued */
+    if (p->se.sched_delayed && !(flags & DEQUEUE_DELAYED))
          return;

Consider the following case:

- p is a fair task with uclamp constraints set.


- P blocks and dequeue_task() calls uclamp_rq_dec() and later
  p->sched_class->dequeue_task() sets "p->se.sched_delayed" to 1.

  uclamp_rq_dec() is called for the first time here and has already
  decremented the clamp_id from the hierarchy.


- Before P can be completely dequeued, P is moved to an RT class
  with p->se.sched_delayed still set to 1 which invokes the following
  call-chain:
    __sched_setscheduler() {
    dequeue_task(DEQUEUE_DELAYED) {
      uclamp_rq_dec() {
        if (p->se.sched_delayed && !(flags & DEQUEUE_DELAYED)) /* false */
          return;

        /* !! Decrements clamp_id again !! */
      }
      /* Dequeues from fair class */
    }
    /* Enqueues onto the RT class */
  }


From my reading, the current code is correct and the special handling in
uclamp_rq_inc() is required because enqueue_task() does a
uclamp_rq_inc() first before calling p->sched_class->enqueue_task()
which will clear "p->se.sched_delayed" if ENQUEUE_DELAYED is set.

dequeue_task() already does a uclamp_rq_dec() before task is delayed and
any further dequeue of a delayed task should not decrement the
uclamp_id.

Please let me know if I've missed something.


Thanks a lot for the detailed explanation!
You're absolutely right — uclamp_rq_dec() is unconditionally called at the top of dequeue_task(), and any further delayed dequeues should not trigger an additional dec.
I now understand the subtle but important difference between the enqueue and dequeue paths.
As a follow-up question: would it make sense to defensively guard uclamp_rq_dec() with something like:
if (!task_on_rq_queued(p))
return;
I understand this is not required with the current call structure, but I wonder whether such a check would be reasonable to prevent accidental double-dec in case of future changes or obscure paths.
Or would this be considered unnecessary runtime overhead and better caught by path analysis?
Looking forward to your thoughts.
Thanks again for the insightful review!

        for_each_clamp_id(clamp_id)
@@ -2039,7 +2040,7 @@ static void __init init_uclamp(void)
    #else /* !CONFIG_UCLAMP_TASK */
  static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
-static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
+static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags) { }
  static inline void uclamp_fork(struct task_struct *p) { }
  static inline void uclamp_post_fork(struct task_struct *p) { }
  static inline void init_uclamp(void) { }
@@ -2112,7 +2113,7 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
       * Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
       * and mark the task ->sched_delayed.
       */
-    uclamp_rq_dec(rq, p);
+    uclamp_rq_dec(rq, p, flags);
      return p->sched_class->dequeue_task(rq, p, flags);
  }


Best regards,
Zihuan