Re: [PATCH v2] sched: Queue task on wakelist in the same llc if the wakee cpu is idle

From: Tianchen Ding
Date: Tue May 31 2022 - 03:20:53 EST


On 2022/5/31 00:24, Valentin Schneider wrote:
On 27/05/22 17:05, Tianchen Ding wrote:
The main idea of wakelist is to avoid cache bouncing. However,
commit 518cd6234178 ("sched: Only queue remote wakeups when
crossing cache boundaries") disabled queuing tasks on wakelist when
the cpus share llc. This is because, at that time, the scheduler must
send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
supports TIF_POLLING, so this is not a problem now when the wakee cpu is
in idle polling.

[...]

Our patch has improvement on schbench, hackbench
and Pipe-based Context Switching of unixbench
when there exists idle cpus,
and no obvious regression on other tests of unixbench.
This can help improve rt in scenes where wakeup happens frequently.

Signed-off-by: Tianchen Ding <dtcccc@xxxxxxxxxxxxxxxxx>

This feels a bit like a generalization of

2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

Given rq->curr is updated before prev->on_cpu is cleared, the waker
executing ttwu_queue_cond() can observe:

p->on_rq=0
p->on_cpu=1
rq->curr=swapper/x (aka idle task)

So your addition of available_idle_cpu() in ttwu_queue_cond() (sort of)
matches that when invoked via:

if (smp_load_acquire(&p->on_cpu) &&
ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;

but it also affects

ttwu_queue(p, cpu, wake_flags);

at the tail end of try_to_wake_up().

Yes. This part is what we mainly want to affect. The above WF_ON_CPU is not our point.


With all that in mind, I'm curious whether your patch is functionaly close
to the below.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 66c4e5922fe1..ffd43264722a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3836,7 +3836,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+ if (cpu_rq(cpu)->nr_running <= 1)
return true;
return false;

It's a little different. This may bring extra IPIs when nr_running == 1 and the current task on wakee cpu is not the target wakeup task (i.e., rq->curr == another_task && rq->curr != p). Then this another_task may be disturbed by IPI which is not expected. So IMO the promise by WF_ON_CPU is necessary.