Re: 4.1.28: memory leak introduced by "mm/swap.c: flush lru pvecs on compound page arrival"

From: Minchan Kim
Date: Sat Jul 16 2016 - 10:47:47 EST


On Fri, Jul 15, 2016 at 09:27:55PM +0200, Jens Rottmann wrote:
> Hi,
>
> 4.1.y stable commit c5ad33184354260be6d05de57e46a5498692f6d6 (Upstream
> commit 8f182270dfec432e93fae14f9208a6b9af01009f) "mm/swap.c: flush lru
> pvecs on compound page arrival" in 4.1.28 introduces a memory leak.
>
> Simply running
>
> while sleep 0.1; do clear; free; done
>
> shows mem continuously going down, eventually system panics with no
> killable processes left. Using "unxz -t some.xz" instead of sleep brings
> system down within minutes.
>
> Kmemleak did not report anything. Bisect ended at named commit, and
> reverting only this commit is indeed sufficient to fix the leak. Swap
> partition on/off makes no difference.
>
> My set-up:
> i.MX6 (ARM Cortex-A9) dual-core, 2 GB RAM. Kernel sources are from
> git.freescale.com i.e. heavily modified by Freescale for i.MX SoCs,
> kernel.org stable patches up to 4.1.28 manually added.
>
> I tried to reproduce with vanilla 4.1.28, but that wouldn't boot at all
> on my hardware, hangs immediately after "Starting kernel", sorry.
> However there is not a single difference between Freescale and vanilla
> in the whole mm/ subdirectory, so I don't think it's i.MX-specific. I
> didn't cross-check with an x86 system (yet).

I didn't have 4.1 stable tree in my local so just looked at git web
and found __lru_cache_add has a bug.

Please change

static void __lru_cache_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);

page_cache_get(page);
if (!pagevec_space(pvec) || PageCompound(page)) <==
__pagevec_lru_add(pvec);
put_cpu_var(lru_add_pvec);
}

with

static void __lru_cache_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);

page_cache_get(page);
if (!pagevec_add(pvec, page) || PageCompound(page)) <==
__pagevec_lru_add(pvec);
put_cpu_var(lru_add_pvec);
}