Re: [RFC PATCH v2 1/7] sched/fair: Add related data structure for task based throttle

From: K Prateek Nayak
Date: Mon Apr 14 2025 - 10:19:55 EST


Hello Aaron,

On 4/14/2025 5:25 PM, Aaron Lu wrote:
On Mon, Apr 14, 2025 at 09:28:36AM +0530, K Prateek Nayak wrote:
Hello Aaron,

On 4/9/2025 5:37 PM, Aaron Lu wrote:
From: Valentin Schneider <vschneid@xxxxxxxxxx>

Add related data structures for this new throttle functionality.

Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx>
Signed-off-by: Aaron Lu <ziqianlu@xxxxxxxxxxxxx>
---
include/linux/sched.h | 4 ++++
kernel/sched/core.c | 3 +++
kernel/sched/fair.c | 12 ++++++++++++
kernel/sched/sched.h | 2 ++
4 files changed, 21 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac19828934..0b55c79fee209 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -880,6 +880,10 @@ struct task_struct {
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
+#ifdef CONFIG_CFS_BANDWIDTH
+ struct callback_head sched_throttle_work;
+ struct list_head throttle_node;

Since throttled tasks are fully dequeued before placing on the
"throttled_limbo_list", is it possible to reuse "p->se.group_node"?

I think it might be possible.

Currently, it is used to track the task on "rq->cfs_tasks" and during
load-balancing when moving a bunch of tasks between CPUs but since a
fully throttled task is not tracked by either, it should be safe to
reuse this bit (CONFIG_DEBUG_LIST will scream if I'm wrong) and save
up on some space in the task_struct.

Thoughts?

Is it that adding throttle_node would cause task_struct to just cross a
cacheline boundary? :-)

Or it's mainly a concern that system could have many tasks and any saving
in task_struct is worth to try?

Mostly this :)


I can see reusing another field would cause task_is_throttled() more
obscure to digest and implement, but I think it is doable.

I completely overlooked task_is_throttled() use-case. I think the
current implementation is much cleaner in that aspect; no need to
overload "p->se.group_node" and over-complicate this.

If we really want some space saving , declaring a "unsigned char
sched_throttled" in the hole next to "sched_delayed" would be cleaner
but I'd wait on Valentin and Peter's comments before going down that
path.


Thanks,
Aaron

--
Thanks and Regards,
Prateek