It is intended to exclude the idle path. My thought was that, since...
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d3e2c5a7c1b7..452eb63ee6f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5395,6 +5395,7 @@ void scheduler_tick(void)
resched_latency = cpu_resched_latency(rq);
calc_global_load_tick(rq);
sched_core_tick(rq);
+ update_overloaded_rq(rq);
I didn't see this update in idle path. Is this on your intend?
the avg_util has contained the historic activity, checking the cpu
status in each idle path seems to have no much difference...
Right. Ideally if there is a 'realtime' idle cpumask for SIS, therq_unlock(rq, &rf);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f80ae86bb404..34b1650f85f6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6323,6 +6323,50 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
#endif /* CONFIG_SCHED_SMT */
+/* derived from group_is_overloaded() */
+static inline bool rq_overloaded(struct rq *rq, int cpu, unsigned int imbalance_pct)
+{
+ if (rq->nr_running - rq->cfs.idle_h_nr_running <= 1)
+ return false;
+
+ if ((SCHED_CAPACITY_SCALE * 100) <
+ (cpu_util_cfs(cpu) * imbalance_pct))
+ return true;
+
+ if ((SCHED_CAPACITY_SCALE * imbalance_pct) <
+ (cpu_runnable(rq) * 100))
+ return true;
So the filter contains cpus that over-utilized or overloaded now.
This steps further to make the filter reliable while at the cost
of sacrificing scan efficiency.
scan would be quite accurate. The issue is how to maintain this
cpumask in low cost.
The idea behind my recent patches is to keep the filter radical,Do you mean, update the per-core idle filter frequently, but only
but use it conservatively.
propogate the filter to LLC-cpumask when the system is overloaded?
Right, imbalance_pct could not be of LLC's, it could be of the core domain's+
+ return false;
+}
+
+void update_overloaded_rq(struct rq *rq)
+{
+ struct sched_domain_shared *sds;
+ struct sched_domain *sd;
+ int cpu;
+
+ if (!sched_feat(SIS_FILTER))
+ return;
+
+ cpu = cpu_of(rq);
+ sd = rcu_dereference(per_cpu(sd_llc, cpu));
+ if (unlikely(!sd))
+ return;
+
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (unlikely(!sds))
+ return;
+
+ if (rq_overloaded(rq, cpu, sd->imbalance_pct)) {
I'm not sure whether it is appropriate to use LLC imbalance pct here,
because we are comparing inside the LLC rather than between the LLCs.
imbalance_pct.
Do you mean only update the filter(idle cpu mask), or only uses the+ /* avoid duplicated write, mitigate cache contention */
+ if (!cpumask_test_cpu(cpu, sdo_mask(sds)))
+ cpumask_set_cpu(cpu, sdo_mask(sds));
+ } else {
+ if (cpumask_test_cpu(cpu, sdo_mask(sds)))
+ cpumask_clear_cpu(cpu, sdo_mask(sds));
+ }
+}
/*
* Scan the LLC domain for idle CPUs; this is dynamically regulated by
* comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6383,6 +6427,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
}
}
+ if (sched_feat(SIS_FILTER) && !has_idle_core && sd->shared)
+ cpumask_andnot(cpus, cpus, sdo_mask(sd->shared));
+
for_each_cpu_wrap(cpu, cpus, target + 1) {
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..1bebdb87c2f4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -62,6 +62,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
*/
SCHED_FEAT(SIS_PROP, false)
SCHED_FEAT(SIS_UTIL, true)
+SCHED_FEAT(SIS_FILTER, true)
The filter should be enabled when there is a need. If the system
is idle enough, I don't think it's a good idea to clear out the
overloaded cpus from domain scan. Making the filter a sched-feat
won't help the problem.
My latest patch will only apply the filter when nr is less than
the LLC size.
filter in SIS when the system meets: nr_running < LLC size?