Re: [PATCH] get rid of lru_add_drain_all() in munlock path

From: Kamalesh Babulal
Date: Thu Nov 06 2008 - 11:41:50 EST


* KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> [2008-11-06 09:14:07]:

> > > > Now, in the current upstream version of the unevictable mlocked pages
> > > > patches, we just count any mlocked pages [vmstat] that make their way to
> > > > free*page() instead of BUGging out, as we were doing earlier during
> > > > development. So, maybe we can drop the lru_drain_add()s in the
> > > > unevictable mlocked pages work and live with the occasional freed
> > > > mlocked page, or mlocked page on the active/inactive lists to be dealt
> > > > with by vmscan.
> > >
> > > hm, okey.
> > > maybe, I was wrong.
> > >
> > > I'll make "dropping lru_add_drain_all()" patch soon.
> > > I expect I need few days.
> > > make the patch: 1 day
> > > confirm by stress workload: 2-3 days
> > >
> > > because rik's original problem only happend on heavy wokload, I think.
> >
> > Indeed. It was an ad hoc test program [2 versions attached] written
> > specifically to beat on COW of shared pages mlocked by parent then COWed
> > by parent or child and unmapped explicitly or via exit. We were trying
> > to find all the ways the we could end up freeing mlocked pages--and
> > there were several. Most of these turned out to be genuine
> > coding/design defects [as difficult as that may be to believe :-)], so
> > tracking them down was worthwhile. And, I think that, in general,
> > clearing a page's mlocked state and rescuing from the unevictable lru
> > list on COW--to prevent the mlocked page from ending up mapped into some
> > task's non-VM_LOCKED vma--is a good thing to strive for.
>
>
>
> > Now, looking at the current code [28-rc1] in [__]clear_page_mlock():
> > We've already cleared the PG_mlocked flag, we've decremented the mlocked
> > pages stats, and we're just trying to rescue the page from the
> > unevictable list to the in/active list. If we fail to isolate the page,
> > then either some other task has it isolated and will return it to an
> > appropriate lru or it resides in a pagevec heading for an in/active lru
> > list. We don't use pagevec for unevictable list. Any other cases? If
> > not, then we can probably dispense with the "try harder" logic--the
> > lru_add_drain()--in __clear_page_mlock().
> >
> > Do you agree? Or have I missed something?
>
> Yup.
> you are perfectly right.
>
> Honestly, I thought lazy rescue isn't so good because it cause statics difference of
> # of mlocked pages and # of unevictalble pages in past time.
> and, I tought i can avoid it.
>
> but it is wrong.
>
> I made its patch actually, but it introduce many and unnecessary messyness.
> So, I believe simple lru_add_drain_all() dropping patch is better.
>
> Again, you are right.
>
>
> In these days, I've run stress workload and I confirm my patch doesn't
> cause mlocked page leak.
>
> this patch also solve Heiko and Kamalesh rtnl
> circular dependency problem (I think).
> http://marc.info/?l=linux-kernel&m=122460208308785&w=2
> http://marc.info/?l=linux-netdev&m=122586921407698&w=2
>
>
> -------------------------------------------------------------------------
> lockdep warns about following message at boot time on one of my test machine.
> Then, schedule_on_each_cpu() sholdn't be called when the task have mmap_sem.
>
> Actually, lru_add_drain_all() exist to prevent the unevictalble pages stay on reclaimable lru list.
> but currenct unevictable code can rescue unevictable pages although it stay on reclaimable list.
>
> So removing is better.
>
> In addition, this patch add lru_add_drain_all() to sys_mlock() and sys_mlockall().
> it isn't must.
> but it reduce the failure of moving to unevictable list.
> its failure can rescue in vmscan later. but reducing is better.
>
>
> Note, if above rescuing happend, the Mlocked and the Unevictable field mismatching happend in /proc/meminfo.
> but it doesn't cause any real trouble.
>
>
<snip warning>

Hi Kosaki-san,

Thanks, the patch fixes the circular locking dependency warning, while
booting up.

Tested-by: Kamalesh Babulal <kamalesh@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> ---
> mm/mlock.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> Index: b/mm/mlock.c
> ===================================================================
> --- a/mm/mlock.c 2008-11-02 20:23:38.000000000 +0900
> +++ b/mm/mlock.c 2008-11-02 21:00:21.000000000 +0900
> @@ -66,14 +66,10 @@ void __clear_page_mlock(struct page *pag
> putback_lru_page(page);
> } else {
> /*
> - * Page not on the LRU yet. Flush all pagevecs and retry.
> + * We lost the race. the page already moved to evictable list.
> */
> - lru_add_drain_all();
> - if (!isolate_lru_page(page))
> - putback_lru_page(page);
> - else if (PageUnevictable(page))
> + if (PageUnevictable(page))
> count_vm_event(UNEVICTABLE_PGSTRANDED);
> -
> }
> }
>
> @@ -187,8 +183,6 @@ static long __mlock_vma_pages_range(stru
> if (vma->vm_flags & VM_WRITE)
> gup_flags |= GUP_FLAGS_WRITE;
>
> - lru_add_drain_all(); /* push cached pages to LRU */
> -
> while (nr_pages > 0) {
> int i;
>
> @@ -251,8 +245,6 @@ static long __mlock_vma_pages_range(stru
> ret = 0;
> }
>
> - lru_add_drain_all(); /* to update stats */
> -
> return ret; /* count entire vma as locked_vm */
> }
>
> @@ -546,6 +538,8 @@ asmlinkage long sys_mlock(unsigned long
> if (!can_do_mlock())
> return -EPERM;
>
> + lru_add_drain_all(); /* flush pagevec */
> +
> down_write(&current->mm->mmap_sem);
> len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
> start &= PAGE_MASK;
> @@ -612,6 +606,8 @@ asmlinkage long sys_mlockall(int flags)
> if (!can_do_mlock())
> goto out;
>
> + lru_add_drain_all(); /* flush pagevec */
> +
> down_write(&current->mm->mmap_sem);
>
> lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
>
>
>

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/