Re: [RFC PATCH V1 1/1] sched/numa: Enhance vma scanning logic

From: Raghavendra K T
Date: Wed Jan 18 2023 - 00:48:35 EST


On 1/17/2023 11:15 PM, Raghavendra K T wrote:
On 1/17/2023 8:29 PM, Mel Gorman wrote:
Note that the cc list is excessive for the topic.

[...]

  struct kioctx_table;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..944d2e3b0b3c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2916,6 +2916,35 @@ static void reset_ptenuma_scan(struct task_struct *p)
      p->mm->numa_scan_offset = 0;
  }
+static bool vma_is_accessed(struct vm_area_struct *vma)
+{
+    int i;
+    bool more_pids_exist;
+    unsigned long pid, max_pids;
+    unsigned long current_pid = current->pid & LAST__PID_MASK;
+
+    max_pids = sizeof(unsigned int) * BITS_PER_BYTE / LAST__PID_SHIFT;
+
+    /* By default we assume >= max_pids exist */
+    more_pids_exist = true;
+
+    if (READ_ONCE(current->mm->numa_scan_seq) < 2)
+        return true;
+
+    for (i = 0; i < max_pids; i++) {
+        pid = (vma->accessing_pids >> i * LAST__PID_SHIFT) &
+            LAST__PID_MASK;
+        if (pid == current_pid)
+            return true;
+        if (pid == 0) {
+            more_pids_exist = false;
+            break;
+        }
+    }
+
+    return more_pids_exist;
+}

I get the intent is to avoid PIDs scanning VMAs that it has never faulted
within but it seems unnecessarily complex to search on every fault to track
just 4 pids with no recent access information. The pid modulo BITS_PER_WORD
couls be used to set a bit on an unsigned long to track approximate recent
acceses and skip VMAs that do not have the bit set. That would allow more
recent PIDs to be tracked although false positives would still exist. It
would be necessary to reset the mask periodically.

Got the idea but I lost you on pid modulo BITS_PER_WORD, (is it
extracting last 5 or 8 bits of PID?) OR
Do you intend to say we can just do

vma->accessing_pids | = current_pid..

so that later we can just check
if (vma->accessing_pids | current_pid) == vma->accessing_pids then it is
a hit..
This becomes simple and we avoid iteration, duplicate tracking etc


Did more brainstorming/thought on this, I see that you meant

active_bit = (current_pid % BITS_PER_LONG);
accessing_pids |= (1UL << active_bit);

In scan path:
active_bit = (current_pid % BITS_PER_LONG);
if (!(accessing_pids & (1UL << active_bit))
      goto skip_scanning;

My approach above would perhaps give more false positive, this seems better thing to..

Thanks, will come up with numbers for this patch + your vma scan delay patch.


Even tracking 4 pids, a reset is periodically needed. Otherwise it'll
be vulnerable to changes in phase behaviour causing all pids to scan all
VMAs again.


Agree. Yes this will be the key thing to do. On a related note I saw
huge increment in numa_scan_seq because we frequently visit scanning
after the patch

[...]