Re: [patch for-4.20] Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

From: David Rientjes
Date: Fri Dec 07 2018 - 18:05:35 EST


On Fri, 7 Dec 2018, Michal Hocko wrote:

> > This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317.
> >
> > There are a couple of issues with 89c83fb539f9 independent of its partial
> > revert in 2f0799a0ffc0 ("mm, thp: restore node-local hugepage
> > allocations"):
> >
> > Firstly, the interaction between alloc_hugepage_direct_gfpmask() and
> > alloc_pages_vma() is racy wrt __GFP_THISNODE and MPOL_BIND.
> > alloc_hugepage_direct_gfpmask() makes sure not to set __GFP_THISNODE for
> > an MPOL_BIND policy but the policy used in alloc_pages_vma() may not be
> > the same for shared vma policies, triggering the WARN_ON_ONCE() in
> > policy_node().
>
> Could you share a test case?
>

Sorry, as Vlastimil pointed out this race does not exist anymore since
commit 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
since it removed the restructuring of alloc_hugepage_direct_gfpmask(). It
existed prior to this commit for shared vma policies that could modify the
policy between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() if
the policy switches to MPOL_BIND in that window.

> > Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat
> > different policy for hugepage allocations, which were allocated through
> > alloc_hugepage_vma(). For hugepage allocations, if the allocating
> > process's node is in the set of allowed nodes, allocate with
> > __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
> > __GFP_THISNODE instead).
>
> Why is it wrong to fallback to an explicitly configured mbind mask?
>

The new_page() case is similar to the shmem_alloc_hugepage() case. Prior
to 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask"), shmem_alloc_hugepage() did
alloc_pages_vma() with hugepage == true, which effected a different
allocation policy: if the node current is running on is allowed by the
policy, use __GFP_THISNODE (considering ac5b2c18911ff is reverted, which
it is in Linus's tree).

After 89c83fb539f9, we lose that and can fallback to remote memory. Since
the discussion is on-going wrt the NUMA aspects of hugepage allocations,
it's better to have a stable 4.20 tree while that is being worked out and
likely deserves separate patches for both new_page() and
shmem_alloc_hugepage(). For the latter specifically, I assume it would be
nice to get an Acked-by by Kirill who implemented shmem_alloc_hugepage()
with hugepage == true back in 4.8 that also had the __GFP_THISNODE
behavior before the allocation policy is suddenly changed.