Re: [PATCH 23/35] autonuma: core

From: Andrea Arcangeli
Date: Tue May 29 2012 - 20:04:31 EST


On Tue, May 29, 2012 at 02:45:54PM +0300, Kirill A. Shutemov wrote:
> On Fri, May 25, 2012 at 07:02:27PM +0200, Andrea Arcangeli wrote:
>
> > +static int knumad_do_scan(void)
> > +{
>
> ...
>
> > + if (knumad_test_exit(mm) || !vma) {
> > + mm_autonuma = mm->mm_autonuma;
> > + if (mm_autonuma->mm_node.next != &knumad_scan.mm_head) {
> > + mm_autonuma = list_entry(mm_autonuma->mm_node.next,
> > + struct mm_autonuma, mm_node);
> > + knumad_scan.mm = mm_autonuma->mm;
> > + atomic_inc(&knumad_scan.mm->mm_count);
> > + knumad_scan.address = 0;
> > + knumad_scan.mm->mm_autonuma->numa_fault_pass++;
> > + } else
> > + knumad_scan.mm = NULL;
>
> knumad_scan.mm should be nulled only after list_del otherwise you will
> have race with autonuma_exit():

Thanks for noticing I managed to reproduce it by setting
knuma_scand/scan_sleep_millisecs and
knuma_scand/scan_sleep_pass_millisecs both to 0 and running a loop of
"while :; do memhog -r10 10m &>/dev/null; done".

So the problem was that if knuma_scand would change the knumad_scan.mm
after the mm->mm_users was 0 but before autonuma_exit run,
autonuma_exit wouldn't notice that the mm->mm_auotnuma was already
unlinked and it would unlink again.

autonuma_exit itself doesn't need to tell anything to knuma_scand,
because if it notices knuma_scand.mm == mm, it will do nothing and it
_always_ relies on knumad_scan to unlink it.

And if instead knuma_scand.mm is != mm, then autonuma_exit knows the
knuma_scand daemon will never have a chance to see the "mm" in the
list again if it arrived first (setting mm_autonuma->mm = NULL there
is just a debug tweak according to the comment).

The "serialize" event is there only to wait the knuma_scand main loop
before taking down the mm (it's not related to the list management).

The mm_autonuma->mm is useless after the "mm_autonuma" has been
unlinked so it's ok to use that to track if knuma_scand arrives first.

The exit path of the kernel daemon also forgot to check for
knumad_test_exit(mm) before unlinking, but that only runs if
kthread_should_stop() is true, and nobody calls kthread_stop so it's
only a theoretical improvement.

So this seems to fix it.

diff --git a/mm/autonuma.c b/mm/autonuma.c
index c2a5a82..768250a 100644
--- a/mm/autonuma.c
+++ b/mm/autonuma.c
@@ -679,9 +679,12 @@ static int knumad_do_scan(void)
} else
knumad_scan.mm = NULL;

- if (knumad_test_exit(mm))
+ if (knumad_test_exit(mm)) {
list_del(&mm->mm_autonuma->mm_node);
- else
+ /* tell autonuma_exit not to list_del */
+ VM_BUG_ON(mm->mm_autonuma->mm != mm);
+ mm->mm_autonuma->mm = NULL;
+ } else
mm_numa_fault_flush(mm);

mmdrop(mm);
@@ -770,8 +773,12 @@ static int knuma_scand(void *none)

mm = knumad_scan.mm;
knumad_scan.mm = NULL;
- if (mm)
+ if (mm && knumad_test_exit(mm)) {
list_del(&mm->mm_autonuma->mm_node);
+ /* tell autonuma_exit not to list_del */
+ VM_BUG_ON(mm->mm_autonuma->mm != mm);
+ mm->mm_autonuma->mm = NULL;
+ }
mutex_unlock(&knumad_mm_mutex);

if (mm)
@@ -996,11 +1003,15 @@ void autonuma_exit(struct mm_struct *mm)
mutex_lock(&knumad_mm_mutex);
if (knumad_scan.mm == mm)
serialize = true;
- else
+ else if (mm->mm_autonuma->mm) {
+ VM_BUG_ON(mm->mm_autonuma->mm != mm);
+ mm->mm_autonuma->mm = NULL; /* debug */
list_del(&mm->mm_autonuma->mm_node);
+ }
mutex_unlock(&knumad_mm_mutex);

if (serialize) {
+ /* prevent the mm to go away under knumad_do_scan main loop */
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/