Re: [PATCH v2 00/12] sched: Address schbench regression
From: Shrikanth Hegde
Date: Mon Jul 21 2025 - 15:37:43 EST
On 7/9/25 00:32, Peter Zijlstra wrote:
On Mon, Jul 07, 2025 at 11:49:17PM +0530, Shrikanth Hegde wrote:
Git bisect points to
# first bad commit: [dc968ba0544889883d0912360dd72d90f674c140] sched: Add ttwu_queue support for delayed tasks
Moo.. Are IPIs particularly expensive on your platform?
It seems like the cost of IPIs is likely hurting here.
IPI latency really depends on whether CPU was busy, shallow idle state or deep idle state.
When it is in deep idle state numbers show close to 5-8us on average on this small system.
When system is busy, (could be doing another schbench thread) is around 1-2us.
Measured the time it took for taking the remote rq lock in baseline, that is around 1-1.5us only.
Also, here LLC is small core.(SMT4 core). So quite often the series would choose to send IPI.
Did one more experiment, pin worker and message thread such that it always sends IPI.
NO_TTWU_QUEUE_DELAYED
./schbench -L -m 4 -M auto -t 64 -n 0 -r 5 -i 5
average rps: 1549224.72
./schbench -L -m 4 -M 0-3 -W 4-39 -t 64 -n 0 -r 5 -i 5
average rps: 1560839.00
TTWU_QUEUE_DELAYED
./schbench -L -m 4 -M auto -t 64 -n 0 -r 5 -i 5 << IPI could be sent quite often ***
average rps: 959522.31
./schbench -L -m 4 -M 0-3 -W 4-39 -t 64 -n 0 -r 5 -i 5 << IPI are always sent. (M,W) don't share cache.
average rps: 470865.00 << rps goes even lower
=================================
*** issues/observations in schbench.
Chris,
When one does -W auto or -M auto i think code is meant to run, n message threads on first n CPUs and worker threads
on remaining CPUs?
I don't see that happening. above behavior can be achieved only with -M <cpus> -W <cpus>
int i = 0;
CPU_ZERO(m_cpus);
for (int i = 0; i < m_threads; ++i) {
CPU_SET(i, m_cpus);
CPU_CLR(i, w_cpus);
}
for (; i < CPU_SETSIZE; i++) { << here i refers to the one in scope. which is 0. Hence w_cpus is set for all cpus.
And hence workers end up running on all CPUs even with -W auto
CPU_SET(i, w_cpus);
}
Another issue, is that if CPU0 if offline, then auto pinning fails. Maybe no one cares about that case?