Re: [PATCH v4 6/7] sched/fair: skip busy cores in SIS search

From: Abel Wu
Date: Wed Jul 13 2022 - 06:26:10 EST


On 7/11/22 8:02 PM, Chen Yu Wrote:
...
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d3e2c5a7c1b7..452eb63ee6f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5395,6 +5395,7 @@ void scheduler_tick(void)
resched_latency = cpu_resched_latency(rq);
calc_global_load_tick(rq);
sched_core_tick(rq);
+ update_overloaded_rq(rq);

I didn't see this update in idle path. Is this on your intend?

It is intended to exclude the idle path. My thought was that, since
the avg_util has contained the historic activity, checking the cpu
status in each idle path seems to have no much difference...

I presume the avg_util is used to decide how many cpus to scan, while
the update determines which cpus to scan.

rq_unlock(rq, &rf);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f80ae86bb404..34b1650f85f6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6323,6 +6323,50 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
#endif /* CONFIG_SCHED_SMT */
+/* derived from group_is_overloaded() */
+static inline bool rq_overloaded(struct rq *rq, int cpu, unsigned int imbalance_pct)
+{
+ if (rq->nr_running - rq->cfs.idle_h_nr_running <= 1)
+ return false;
+
+ if ((SCHED_CAPACITY_SCALE * 100) <
+ (cpu_util_cfs(cpu) * imbalance_pct))
+ return true;
+
+ if ((SCHED_CAPACITY_SCALE * imbalance_pct) <
+ (cpu_runnable(rq) * 100))
+ return true;

So the filter contains cpus that over-utilized or overloaded now.
This steps further to make the filter reliable while at the cost
of sacrificing scan efficiency.

Right. Ideally if there is a 'realtime' idle cpumask for SIS, the
scan would be quite accurate. The issue is how to maintain this
cpumask in low cost.

Yes indeed.

The idea behind my recent patches is to keep the filter radical,
but use it conservatively.

Do you mean, update the per-core idle filter frequently, but only
propogate the filter to LLC-cpumask when the system is overloaded?

Not exactly. I want to update the filter (BTW there is only the LLC
filter, no core filters :)) once core state changes, while apply it
in SIS domain scan only if the domain is busy enough.

+
+ return false;
+}
+
+void update_overloaded_rq(struct rq *rq)
+{
+ struct sched_domain_shared *sds;
+ struct sched_domain *sd;
+ int cpu;
+
+ if (!sched_feat(SIS_FILTER))
+ return;
+
+ cpu = cpu_of(rq);
+ sd = rcu_dereference(per_cpu(sd_llc, cpu));
+ if (unlikely(!sd))
+ return;
+
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (unlikely(!sds))
+ return;
+
+ if (rq_overloaded(rq, cpu, sd->imbalance_pct)) {

I'm not sure whether it is appropriate to use LLC imbalance pct here,
because we are comparing inside the LLC rather than between the LLCs.

Right, imbalance_pct could not be of LLC's, it could be of the core domain's
imbalance_pct.
+ /* avoid duplicated write, mitigate cache contention */
+ if (!cpumask_test_cpu(cpu, sdo_mask(sds)))
+ cpumask_set_cpu(cpu, sdo_mask(sds));
+ } else {
+ if (cpumask_test_cpu(cpu, sdo_mask(sds)))
+ cpumask_clear_cpu(cpu, sdo_mask(sds));
+ }
+}
/*
* Scan the LLC domain for idle CPUs; this is dynamically regulated by
* comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6383,6 +6427,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
}
}
+ if (sched_feat(SIS_FILTER) && !has_idle_core && sd->shared)
+ cpumask_andnot(cpus, cpus, sdo_mask(sd->shared));
+
for_each_cpu_wrap(cpu, cpus, target + 1) {
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..1bebdb87c2f4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -62,6 +62,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
*/
SCHED_FEAT(SIS_PROP, false)
SCHED_FEAT(SIS_UTIL, true)
+SCHED_FEAT(SIS_FILTER, true)

The filter should be enabled when there is a need. If the system
is idle enough, I don't think it's a good idea to clear out the
overloaded cpus from domain scan. Making the filter a sched-feat
won't help the problem.

My latest patch will only apply the filter when nr is less than
the LLC size.
Do you mean only update the filter(idle cpu mask), or only uses the
filter in SIS when the system meets: nr_running < LLC size?


In SIS domain search, apply the filter when nr < LLC_size. But I
haven't tested this with SIS_UTIL, and in the SIS_UTIL case this
condition seems always true.

Thanks,
Abel