Re: [PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle

From: Sebastian Andrzej Siewior
Date: Mon Aug 11 2025 - 04:36:37 EST

Next message: Harry Yoo: "Re: [PATCH V4 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h"
Previous message: Peter Zijlstra: "Re: [RFC PATCH 0/3] sched: add ability to throttle sched_yield() calls to reduce contention"
In reply to: Xin Zhao: "[PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle"
Next in thread: Valentin Schneider: "Re: [PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2025-08-11 15:08:38 [+0800], Xin Zhao wrote:
> After enabling PREEMPT_RT, ordinary spinlocks can also be subject to cgroup
> limits during the lock-holding period. This can lead to seemingly unrelated
> threads experiencing timing dependencies due to underlying logic, such as
> memory allocation, resulting in delayed wake-up behaviors that are difficult
> to understand when analyzing traces captured by tools like Perfetto.
> Due to the prevalence of this performance issue when using cgroup CPU
> throttling with PREEMPT_RT, the CGROUP_LOCK_OPTIMIZE configuration will be
> enabled by default when both PREEMPT_RT and CFS_BANDWIDTH are activated.
> This configuration option temporarily increases the priority of tasks to
> SCHED_RR 1 if they hold a lock (excluding raw spinlocks, RCU, and seqlock)
> and are limited by cgroup, provided they are SCHED_NORMAL. Once the lock is
> released, the priority will be restored.
> This patch is a derivative of the priority inheritance patch. While priority
> inheritance can cover scenarios involving spinlocks and mutexes, it cannot
> address the timing dependency issues between two SCHED_NORMAL tasks caused
> by underlying locks. Additionally, the lazy_preempt feature does not cover
> scenarios where a real-time task, such as a ktimer, interrupts a lock-holding
> SCHED_NORMAL task, which is then throttled by cgroup cpu.
> This patch not only addresses the issue of cgroup limits affecting spinlocks
> under PREEMPT_RT but also resolves issues related to holding mutex or
> semaphore locks, as well as other core rt_mutex locks under PREEMPT_RT.
> The following stack trace illustrates the delayed wake-up behavior caused by
> two seemingly unrelated threads due to underlying logic:

urgh.

What about using task_work_add() and throttling the task on its way to
userland? The callback will be invoked without any locks held.

Sebastian

Next message: Harry Yoo: "Re: [PATCH V4 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h"
Previous message: Peter Zijlstra: "Re: [RFC PATCH 0/3] sched: add ability to throttle sched_yield() calls to reduce contention"
In reply to: Xin Zhao: "[PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle"
Next in thread: Valentin Schneider: "Re: [PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]