[PATCH] sched/rt: optimize cpupri_vec layout

From: Pan Deng
Date: Wed Jun 11 2025 - 23:08:08 EST

Next message: Deng, Pan: "RE: [PATCH] sched/rt: optimize cpupri_vec layout"
Previous message: Xiongfeng Wang: "Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting"
Next in thread: Deng, Pan: "RE: [PATCH] sched/rt: optimize cpupri_vec layout"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

When running a multi-instance ffmpeg transcoding workload which uses rt
thread in a high core count system, cpupri_vec->count contends with the
reading of mask in the same cache line in function cpupri_find_fitness
and cpupri_set.
This change separates each count and mask into different cache lines by
cache aligned attribute to avoid the false sharing.
Tested in a 2 sockets, 240 physical core 480 logical core machine, running
60 ffmpeg transcoding instances. With the change, the kernel cycles% is
reduced from ~20% to ~12%, the fps metric is improved ~11%.
The side effect of this change is that struct cpupri size is increased
from 26 cache lines to 203 cache lines.

Signed-off-by: Pan Deng <pan.deng@xxxxxxxxx>
Signed-off-by: Tianyou Li <tianyou.li@xxxxxxxxx>
Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
---
kernel/sched/cpupri.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
index d6cba0020064..245b0fa626be 100644
--- a/kernel/sched/cpupri.h
+++ b/kernel/sched/cpupri.h
@@ -9,7 +9,7 @@

struct cpupri_vec {
atomic_t count;
- cpumask_var_t mask;
+ cpumask_var_t mask ____cacheline_aligned;
};

struct cpupri {
--
2.43.5

Next message: Deng, Pan: "RE: [PATCH] sched/rt: optimize cpupri_vec layout"
Previous message: Xiongfeng Wang: "Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting"
Next in thread: Deng, Pan: "RE: [PATCH] sched/rt: optimize cpupri_vec layout"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]