A major problem with threading the Memory Management code (not
just with moving pages), is bits in a pte which can be changed
asychronously by the h/w. For most archs, these attributes are
'referenced' and 'dirty'. Any kernel locking cannot avoid these
races [as they are changed without entering kernel-space].
Really? No locking can avoid these problems, surely you jest.
I was able to fix both these problems in 15 minutes of work, and it
changed very little of the kernel and did not require cross-calls in
%95 of the cases, even under high load. Worked like this in
pseudocode:
swap_out() {
int allow_others = 0;
again:
for_each_candidate_task(tsk) {
int others_dirty;
spin_lock(&scheduler_lock);
others_dirty = (tsk->mm->cpu_vm_mask !=
(1UL << smp_processor_id()));
if(!allow_others && others_dirty)
goto next;
swap_out_task(tsk, others_dirty);
continue;
next:
spin_unlock(&scheduler_lock);
}
if(no_progress_made) {
allow_others = 1;
goto again;
}
}
swap_out_task(tsk, others_dirty) {
if(others_dirty)
smp_capture();
if(good_idea_to_swap(pte)) {
flush_cache...();
set_pte(...);
flush_tlb...();
smp_release();
spin_unlock(&scheduler_lock);
free_up_the_page(); /* could sleep */
return;
}
out:
if(others_dirty)
smp_release();
spin_unlock(&scheduler_lock);
}
All straight forward, maybe 15 or 16 lines of changes to the generic
code. Don't rewrite what ain't broke.
Later,
David "Sparc" Miller
davem@caip.rutgers.edu