Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.

From: Zi Yan
Date: Thu May 19 2022 - 19:24:59 EST


On 19 May 2022, at 17:35, Zi Yan wrote:

> On 19 May 2022, at 16:57, Qian Cai wrote:
>
>> On Thu, Apr 28, 2022 at 08:39:06AM -0400, Zi Yan wrote:
>>> How about the one attached? I can apply it to next-20220428. Let me know
>>> if you are using a different branch. Thanks.
>>
>> Zi, it turns out that the endless loop in isolate_single_pageblock() can
>> still be reproduced on today's linux-next tree by running the reproducer a
>> few times. With this debug patch applied, it keeps printing the same
>> values.
>>
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -399,6 +399,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>> };
>> INIT_LIST_HEAD(&cc.migratepages);
>>
>> + printk_ratelimited("KK stucked pfn=%lu head_pfn=%lu nr_pages=%lu boundary_pfn=%lu\n", pfn, head_pfn, nr_pages, boundary_pfn);
>> ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> head_pfn + nr_pages);
>>
>> isolate_single_pageblock: 179 callbacks suppressed
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>> KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896
>
> Hi Qian,
>
> Thanks for your testing.
>
> Do you have a complete reproducer? From your printout, it is clear that a 512-page compound
> page caused the infinite loop, because the page was not migrated and the code kept
> retrying. But __alloc_contig_migrate_range() is supposed to return non-zero to tell the
> code the page cannot be migrated and the code will goto failed without retrying. It will be
> great you can share what exactly has run after boot, so that I can reproduce locally to
> identify what makes __alloc_contig_migrate_range() return 0 without migrating the page.
>
> Can you also try the patch below to see if it fixes the infinite loop?

I also have an off-by-one error in the code. The error caused unnecessary effort of
trying to migrate some pages. Your endless loop case seems to be caused by it.
Can you actually try the patch below? Thanks.

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b3f074d1682e..5c8099bb822f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -374,7 +374,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
struct page *head = compound_head(page);
unsigned long head_pfn = page_to_pfn(head);

- if (head_pfn + nr_pages < boundary_pfn) {
+ if (head_pfn + nr_pages <= boundary_pfn) {
pfn = head_pfn + nr_pages;
continue;
}
@@ -417,10 +417,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
order = 0;
outer_pfn = pfn;
while (!PageBuddy(pfn_to_page(outer_pfn))) {
- if (++order >= MAX_ORDER) {
- outer_pfn = pfn;
- break;
- }
+ if (++order >= MAX_ORDER)
+ goto failed;
outer_pfn &= ~0UL << order;
}
pfn = outer_pfn;

--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature