Re: [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage

From: K Prateek Nayak
Date: Wed Jul 16 2025 - 02:06:51 EST


(+ Ayush, Srikanth)

Hello Chris,

On 7/16/2025 12:41 AM, Chris Mason wrote:
>> @@ -186,7 +208,7 @@ static void group_init(struct psi_group
>>
>> group->enabled = true;
>> for_each_possible_cpu(cpu)
>> - seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
>> + seqcount_init(per_cpu_ptr(&psi_seq, cpu));
>> group->avg_last_update = sched_clock();
>> group->avg_next_update = group->avg_last_update + psi_period;
>> mutex_init(&group->avgs_lock);
>
> I'm not sure if someone mentioned this already, but testing the
> series I got a bunch of softlockups in get_recent_times()
> that randomly jumped from CPU to CPU.

Ayush, Srikanth, and I ran into this yesterday when testing different
trees (next, queue:sched/core) with similar signatures:

watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [kworker/0:3:2140]
Modules linked in: ...
CPU: 0 UID: 0 PID: 2140 Comm: kworker/0:3 Not tainted 6.16.0-rc1-test+ #20 PREEMPT(voluntary)
Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
Workqueue: events psi_avgs_work
RIP: 0010:collect_percpu_times+0x3a0/0x670
Code: 65 48 2b 05 4a 79 d2 02 0f 85 dc 02 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d e9 34 ba d2 ff f3 90 49 81 ff ff 1f 00 00 <0f> 86 73 fd ff ff 4c 89 fe 48 c7 c7 80 9d 29 bb e8 cb 92 73 00 e9
RSP: 0018:ffffcda753383d10 EFLAGS: 00000297
RAX: ffff8be86fadcd40 RBX: ffffeda7308d4580 RCX: 000000000000006b
RDX: 000000000000002b RSI: 0000000000000100 RDI: ffffffffbab3f400
RBP: ffffcda753383e30 R08: 000000000000006b R09: 0000000000000000
R10: 0000008cca6be372 R11: 0000000000000006 R12: 000000000000006b
R13: ffffeda7308d4594 R14: 00000000000037e5 R15: 000000000000006b
FS: 0000000000000000(0000) GS:ffff8ba8c1118000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b3cf990c3c CR3: 000000807dc40006 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
? srso_alias_return_thunk+0x5/0xfbef5
? psi_group_change+0x1ff/0x460
? add_timer_on+0x10a/0x160
psi_avgs_work+0x4c/0xd0
? queue_delayed_work_on+0x6d/0x80
process_one_work+0x193/0x3c0
worker_thread+0x29d/0x3c0
? __pfx_worker_thread+0x10/0x10
kthread+0xff/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x171/0x1a0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>

I was able to reproduce this reliably running 100 copies of an infinite
loop doing - cgroup create, move a task, move task back to root, remove
cgroup - alongside hackbench running in a seperate cgroup and I hit this
in ~5-10min.

I have been running the same test with your fix and haven't run into
this for over 30min now. Feel free to add:

Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

>
> This fixed it for me, but reading it now I'm wondering
> if we want to seqcount_init() unconditionally even when PSI
> is off.

Looking at "psi_enable", it can only be toggled via the kernel
parameter "psi=" and I don't see anything that does a
"static_branch_disable(&psi_disabled)" at runtime so I think
your fix should be good.

>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 2024c1d36402d..979a447bc281f 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -207,8 +207,6 @@ static void group_init(struct psi_group *group)
> int cpu;

"cpu" variable can be removed too from group_init() now.

>
> group->enabled = true;
> - for_each_possible_cpu(cpu)
> - seqcount_init(per_cpu_ptr(&psi_seq, cpu));
> group->avg_last_update = sched_clock();
> group->avg_next_update = group->avg_last_update + psi_period;
> mutex_init(&group->avgs_lock);
> @@ -231,6 +229,7 @@ static void group_init(struct psi_group *group)
>
> void __init psi_init(void)
> {
> + int cpu;

nit. newline after declaration.

> if (!psi_enable) {
> static_branch_enable(&psi_disabled);
> static_branch_disable(&psi_cgroups_enabled);
> @@ -241,6 +240,8 @@ void __init psi_init(void)
> static_branch_disable(&psi_cgroups_enabled);
>
> psi_period = jiffies_to_nsecs(PSI_FREQ);
> + for_each_possible_cpu(cpu)
> + seqcount_init(per_cpu_ptr(&psi_seq, cpu));
> group_init(&psi_system);
> }
>
>

--
Thanks and Regards,
Prateek