Re: [patch 1/2] mm: remove GFP_THISNODE

From: Vlastimil Babka
Date: Thu Feb 26 2015 - 03:30:43 EST


On 02/26/2015 01:23 AM, David Rientjes wrote:
> NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
>
> GFP_THISNODE is a secret combination of gfp bits that have different
> behavior than expected. It is a combination of __GFP_THISNODE,
> __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator
> slowpath to fail without trying reclaim even though it may be used in
> combination with __GFP_WAIT.
>
> An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
> GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
> that really just wanted __GFP_THISNODE. The problem doesn't end there,
> however, because even it was a no-op for alloc_misplaced_dst_page(),
> which also sets __GFP_NORETRY and __GFP_NOWARN, and
> migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
> is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is
> a no-op in these cases since the page allocator special-cases
> __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
>
> It's time to just remove GFP_TRANSHUGE entirely. We leave __GFP_THISNODE

^THISNODE :) Although yes, it would be nice if we
could replace the GFP_TRANSHUGE magic checks as well.

> to restrict an allocation to a local node, but remove GFP_TRANSHUGE and
> it's obscurity. Instead, we require that a caller clear __GFP_WAIT if it
> wants to avoid reclaim.
>
> This allows the aforementioned functions to actually reclaim as they
> should. It also enables any future callers that want to do
> __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
> rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

So, I agree with the intention, but this has some subtle implications that
should be mentioned/decided. The check for GFP_THISNODE in
__alloc_pages_slowpath() comes earlier than the check for __GFP_WAIT. So the
differences will be:

1) We will now call wake_all_kswapds(), unless __GFP_NO_KSWAPD is passed, which
is only done for hugepages and some type of i915 allocation. Do we want the
opportunistic attempts from slab to wake up kswapds or do we pass the flag?

2) There will be another attempt on get_page_from_freelist() with different
alloc_flags than in the fast path attempt. Without __GFP_WAIT (and also, again,
__GFP_KSWAPD, since your commit b104a35d32, which is another subtle check for
hugepage allocations btw), it will consider the allocation atomic and add
ALLOC_HARDER flag, unless __GFP_NOMEMALLOC is in __gfp_flags - it seems it's
generally not. It will also clear ALLOC_CPUSET, which was the concern of
b104a35d32. However, if I look at __cpuset_node_allowed(), I see that it's
always true for __GFP_THISNODE, which makes me question commit b104a35d32 in
light of your patch 2/2 and generally the sanity of all these flags and my
career choice.

Ugh :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/