Re: [RFC] linux-next panic in hugepage_subpool_put_pages()

From: Andrew Morton
Date: Tue Feb 23 2021 - 19:56:38 EST


On Tue, 23 Feb 2021 10:06:12 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:

> On 2/23/21 6:57 AM, Gerald Schaefer wrote:
> > Hi,
> >
> > LTP triggered a panic on s390 in hugepage_subpool_put_pages() with
> > linux-next 5.12.0-20210222, see below.
> >
> > It crashes on the spin_lock(&spool->lock) at the beginning, because the
> > passed-in *spool points to 0000004e00000000, which is not addressable
> > memory. It rather looks like some flags and not a proper address. I suspect
> > some relation to the recent rework in that area, e.g. commit f1280272ae4d
> > ("hugetlb: use page.private for hugetlb specific page flags").
> >
> > __free_huge_page() calls hugepage_subpool_put_pages() and takes *spool from
> > hugetlb_page_subpool(page), which was changed by that commit to use
> > page[1]->private now.
> >
>
> Thanks Gerald,
>
> Yes, I believe f1280272ae4d is the root cause of this issue. In that
> commit, the subpool pointer was moved from page->private of the head
> page to page->private of the first subpage. The page allocator will
> initialize (zero) the private field of the head page, but not that of
> subpages. So, that bad subpool pointer is likely an old page->private
> value for the page.
>
> That strange call path from set_max_huge_pages to __free_huge_page is
> actually how the code puts newly allocated pages on it's interfal free
> list.
>
> I will do a bit more verification and put together a patch (it should
> be simple).

There's also Michel's documentation request:
https://lkml.kernel.org/r/20210127102645.GH827@xxxxxxxxxxxxxx