On 27/05/22 17:05, Tianchen Ding wrote:
The main idea of wakelist is to avoid cache bouncing. However,
commit 518cd6234178 ("sched: Only queue remote wakeups when
crossing cache boundaries") disabled queuing tasks on wakelist when
the cpus share llc. This is because, at that time, the scheduler must
send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
supports TIF_POLLING, so this is not a problem now when the wakee cpu is
in idle polling.
[...]
Our patch has improvement on schbench, hackbench
and Pipe-based Context Switching of unixbench
when there exists idle cpus,
and no obvious regression on other tests of unixbench.
This can help improve rt in scenes where wakeup happens frequently.
Signed-off-by: Tianchen Ding <dtcccc@xxxxxxxxxxxxxxxxx>
This feels a bit like a generalization of
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
Given rq->curr is updated before prev->on_cpu is cleared, the waker
executing ttwu_queue_cond() can observe:
p->on_rq=0
p->on_cpu=1
rq->curr=swapper/x (aka idle task)
So your addition of available_idle_cpu() in ttwu_queue_cond() (sort of)
matches that when invoked via:
if (smp_load_acquire(&p->on_cpu) &&
ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;
but it also affects
ttwu_queue(p, cpu, wake_flags);
at the tail end of try_to_wake_up().
With all that in mind, I'm curious whether your patch is functionaly close
to the below.
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 66c4e5922fe1..ffd43264722a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3836,7 +3836,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+ if (cpu_rq(cpu)->nr_running <= 1)
return true;
return false;