Re: [PATCH] mm/hwpoison: fix incorrect call put_hwpoison_page() when isolate_huge_page() return false

From: Naoya Horiguchi
Date: Tue Nov 13 2018 - 02:47:25 EST


On Tue, Nov 13, 2018 at 03:00:09PM +0800, Yongkai Wu wrote:
> when isolate_huge_page() return false,it won't takes a refcount of page,
> if we call put_hwpoison_page() in that case,we may hit the VM_BUG_ON_PAGE!
>
> Signed-off-by: Yongkai Wu <nic_w@xxxxxxx>
> ---
> mm/memory-failure.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 0cd3de3..ed09f56 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1699,12 +1699,13 @@ static int soft_offline_huge_page(struct page *page,
> int flags)
> unlock_page(hpage);
>
> ret = isolate_huge_page(hpage, &pagelist);
> - /*
> - * get_any_page() and isolate_huge_page() takes a refcount each,
> - * so need to drop one here.
> - */
> - put_hwpoison_page(hpage);
> - if (!ret) {
> + if (ret) {
> + /*
> + * get_any_page() and isolate_huge_page() takes a refcount each,
> + * so need to drop one here.
> + */
> + put_hwpoison_page(hpage);
> + } else {

Hi Yongkai,

Although the current code might look odd, it's OK. We have to release
one refcount whether this isolate_huge_page() succeeds or not, because
the put_hwpoison_page() is cancelling the refcount from get_any_page()
which always succeeds when we enter soft_offline_huge_page().

Let's consider that the isolate_huge_page() fails with your patch applied,
then the refcount taken by get_any_page() is never released after returning
from soft_offline_page(). That will lead to memory leak.

I think that current code comment doesn't explaing it well, so if you
like, you can fix the comment. (If you do that, please check coding style.
scripts/checkpatch.pl will help you.)

Thanks,
Naoya Horiguchi