Re: [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()

From: Anshuman Khandual
Date: Mon May 30 2022 - 22:22:25 EST




On 5/30/22 19:23, Zi Yan wrote:
> On 30 May 2022, at 7:50, Anshuman Khandual wrote:
>
>> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
>> is getting stuck in isolate_single_pageblock(), because of an infinite loop
>> Because head_pfn always evaluate the same, so does pfn, and the outer loop
>> never exits. Dropping the relevant code block, which seems redundant, makes
>> the problem go away.
>
> Thanks for the report.
>
>>
>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Zi Yan <ziy@xxxxxxxxxx>
>> Cc: linux-mm@xxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>> ---
>> I am not sure about this fix, and also did not find much time today to
>> debug any further. There are much code changes around this function in
>> recent days. This problem is present on latest mainline kernel.
>>
>> - Anshuman
>>
>> mm/page_isolation.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 6021f8444b5a..b0922fee75c1 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>> struct page *head = compound_head(page);
>> unsigned long head_pfn = page_to_pfn(head);
>>
>> - if (head_pfn + nr_pages <= boundary_pfn) {
>> - pfn = head_pfn + nr_pages;
>> - continue;
>> - }
>> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> /*
>> * hugetlb, lru compound (THP), and movable compound pages
>> --
>> 2.20.1
>
> Can you try the patch below to see if it fixes the issue? Thanks.
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 6021f8444b5a..d200d41ad0d3 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> * above do the rest. If migration is not possible, just fail.
> */
> if (PageCompound(page)) {
> - unsigned long nr_pages = compound_nr(page);
> struct page *head = compound_head(page);
> unsigned long head_pfn = page_to_pfn(head);
> + unsigned long nr_pages = compound_nr(head);
>
> if (head_pfn + nr_pages <= boundary_pfn) {
> pfn = head_pfn + nr_pages;
>
>

Yes, this does solve the problem. I guess nr_pages should have been derived
from the compound head itself for it be meaningful (i.e > 1). I assume you
will send a fix patch with appropriate write up that describes this problem.

- Anshuman