Re: Stuck looping on list_empty(list) in free_pcppages_bulk()

From: Vlastimil Babka
Date: Tue Aug 31 2021 - 12:51:34 EST


On 8/31/21 01:12, Sultan Alsawaf wrote:
> With some more gdb digging, I found that the `count` variable was stored in %ESI
> at the time of the stall. According to register dump in the splat, %ESI was 7.
>
> It looks like, for some reason, the pcp count was 7 higher than the number of
> pages actually present in the pcp lists.
>
> I tried to find some way that this could happen, but the only thing I could
> think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> and then pages could be added to an out-of-bounds pcp list while still
> incrementing the overall pcp count. This seems pretty unlikely though. As
> another side note, it looks like there's nothing stopping this from occurring;
> there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
> set.
>
> Any ideas on what may have happened here, or a link to a commit that may have
> fixed this issue in newer kernels, would be much appreciated.

Does the kernel have commit 88e8ac11d2ea3 backported? If not, and there were
memory hotplug operations happening, the infinite loop could happen. If that
commit was not backported, and instead 5c3ad2eb7104 was backported, possibly
there are more scenarios outside hotplug that can cause trouble.

> Thanks,
> Sultan
>