Re: mm: pages are not freed from lru_add_pvecs after process termination

From: Vlastimil Babka
Date: Fri May 13 2016 - 07:29:25 EST

Next message: Rafael J. Wysocki: "Re: [PATCH V7 08/11] pci, acpi: Support for ACPI based generic PCI host controller"
Previous message: Andy Shevchenko: "Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV"
In reply to: Michal Hocko: "Re: mm: pages are not freed from lru_add_pvecs after process termination"
Next in thread: Odzioba, Lukasz: "RE: mm: pages are not freed from lru_add_pvecs after process termination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/11/2016 09:53 AM, Michal Hocko wrote:

On Fri 06-05-16 09:04:34, Dave Hansen wrote:

On 05/06/2016 08:10 AM, Odzioba, Lukasz wrote:

On Thu 05-05-16 09:21:00, Michal Hocko wrote:

Or maybe the async nature of flushing turns
out to be just impractical and unreliable and we will end up skipping
THP (or all compound pages) for pcp LRU add cache. Let's see...

What if we simply skip lru_add pvecs for compound pages?
That way we still have compound pages on LRU's, but the problem goes
away. It is not quite what this naïve patch does, but it works nice for me.

diff --git a/mm/swap.c b/mm/swap.c
index 03aacbc..c75d5e1 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -392,7 +392,9 @@ static void __lru_cache_add(struct page *page)
get_page(page);
if (!pagevec_space(pvec))
__pagevec_lru_add(pvec);
pagevec_add(pvec, page);
+ if (PageCompound(page))
+ __pagevec_lru_add(pvec);
put_cpu_var(lru_add_pvec);
}

That's not _quite_ what I had in mind since that drains the entire pvec
every time a large page is encountered. But I'm conflicted about what
the right behavior _is_.

We'd taking the LRU lock for 'page' anyway, so we might as well drain
the pvec.

Note that pages in the pagevec can come from different zones, so this is not universally true.

Yes I think this makes sense. The only case where it would be suboptimal
is when the pagevec was already full and then we just created a single
page pvec to drain it. This can be handled better though by:

diff --git a/mm/swap.c b/mm/swap.c
index 95916142fc46..3fe4f180e8bf 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -391,9 +391,8 @@ static void __lru_cache_add(struct page *page)
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);

get_page(page);
- if (!pagevec_space(pvec))
+ if (!pagevec_add(pvec, page) || PageCompound(page))
__pagevec_lru_add(pvec);
- pagevec_add(pvec, page);
put_cpu_var(lru_add_pvec);
}

Yeah that could work. There might be more complex solutions at the level
of lru_cache_add_active_or_unevictable() where we call it either from
base page code (mm/memory.c) or functions in mm/huge_memory.c. We could
redirect it at that point, but likely not worth the trouble unless this
simple solution doesn't show some performance regression...

Or, does the additional work to put the page on to a pvec and then
immediately drain it overwhelm that advantage?

pagevec_add is quite trivial so I would be really surprised if it
mattered.

Next message: Rafael J. Wysocki: "Re: [PATCH V7 08/11] pci, acpi: Support for ACPI based generic PCI host controller"
Previous message: Andy Shevchenko: "Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV"
In reply to: Michal Hocko: "Re: mm: pages are not freed from lru_add_pvecs after process termination"
Next in thread: Odzioba, Lukasz: "RE: mm: pages are not freed from lru_add_pvecs after process termination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]