Re: [PATCH v2] sched/cache: Reduce the overhead of task_cache_work by only scan the visisted cpus.

Next message: Baolin Wang: "Re: [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged"
Previous message: Hao Ge: "Re: [PATCH] tools/cgroup/slabinfo: Fix use of slab.memcg_data"
In reply to: Luo Gengkun: "Re: [PATCH v2] sched/cache: Reduce the overhead of task_cache_work by only scan the visisted cpus."
Next in thread: Luo Gengkun: "[PATCH v3] sched/cache: Reduce the overhead of task_cache_work by only scan the visisted cpus."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Chen, Yu C

Date: Mon Apr 20 2026 - 03:54:19 EST

On 4/18/2026 5:01 PM, Luo Gengkun wrote:

On 2026/4/15 11:10, Chen, Yu C wrote:

Hi Gengkun,

[ ... ]

@@ -1736,8 +1746,17 @@ static void task_cache_work(struct callback_head *work)
                  continue;
              for_each_cpu(i, sched_domain_span(sd)) {
-                occ = fraction_mm_sched(cpu_rq(i),
-                            per_cpu_ptr(mm->sc_stat.pcpu_sched, i));
+                struct rq *rq = cpu_rq(i);
+                struct sched_cache_time *pcpu_sched = per_cpu_ptr(mm->sc_stat.pcpu_sched, i);
+                /* Skip the rq that has not been hit for a long time */
+                if (sched_cache_timeout_enabled() &&
+                    cpumask_test_cpu(cpu_of(rq), &mm- >sc_stat.visited_cpus) &&

cpumask_test_cpu(i) should be fine. The rq access above doesn't hold cpu_epoch_lock.
I wonder if we can safely calculate rq->cpu_epoch - pcpu_sched->epoch
inside fraction_mm_sched while holding the lock?

Do we really need to access rq->cpu_epoch under the lock for read scenarios?
I noticed task_tick_cache accesses it directly. Plus, moving this access outside
the lock would help reduce lock contention.

Good question. task_tick_cache() access local rq->cpu_epoch with rq->lock held
and irq disabled, while task_cache_work() is running with irq enabled without
any rq->lock hold, and might not be run on local rq - see __exit_to_user_mode_loop(),
it checks _TIF_NEED_RESCHED before _TIF_NOTIFY_RESUME, so p could be switched out
and woken up and run task_cache_work() on a different CPU.
That is to say, I just wonder if there could be a race window
that bring inconsistency between two reads of rq->cpu_epoch - pcpu_sched->epoch
- not necessary a critical issue though.

thanks,
Chenyu