Re: [RFC PATCH 2/2] xfs: map KM_MAYFAIL to __GFP_RETRY_HARD

From: Vlastimil Babka
Date: Tue Jun 21 2016 - 05:29:16 EST


On 06/21/2016 06:22 AM, Johannes Weiner wrote:
I think whether the best-effort behavior should be opt-in or opt-out,
or how fine-grained the latency/success control over the allocator
should be is a different topic. I'd prefer defaulting to reliability
and annotating low-latency requirements, but I can see TRY_HARD work
too. It just shouldn't imply MAY_FAIL.

It is always hard to change the default behavior without breaking
anything. Up to now we had opt-in and as you can see there are not that
many users who really wanted to have higher reliability. I guess this is
because they just do not care and didn't see too many failures. The
opt-out has also a disadvantage that we would need to provide a flag
to tell to try less hard and all we have is NORETRY and that is way too
easy. So to me it sounds like the opt-in fits better with the current
usage.

For costly allocations, the presence of __GFP_NORETRY is exactly the
same as the absence of __GFP_REPEAT. So if we made __GFP_REPEAT the
default (and deleted the flag), the opt-outs would use __GFP_NORETRY
to restore their original behavior.

Just FYI, this argument distorts my idea how to get rid of hacky checks for GFP_TRANSHUGE and PF_KTHREAD (patches 05 and 06 in [1]), where I observed the mentioned no difference between __GFP_NORETRY presence and __GFP_REPEAT absence, and made use of it. Without __GFP_REPEAT I'd have two options for khugepaged and madvise(MADV_HUGEPAGE) allocations. Either pass __GFP_NORETRY and make them fail more, or don't and then they become much more disruptive (if the default becomes best-effort, i.e. what __GFP_REPEAT used to do).

[1] http://thread.gmane.org/gmane.linux.kernel.mm/152313

As for changing the default - remember that we currently warn about
allocation failures as if they were bugs, unless they are explicitely
allocated with the __GFP_NOWARN flag. We can assume that the current
__GFP_NOWARN sites are 1) commonly failing but 2) prefer to fall back
rather than incurring latency (otherwise they would have added the
__GFP_REPEAT flag). These sites would be a good list of candidates to
annotate with __GFP_NORETRY. If we made __GFP_REPEAT then the default,
the sites that would then try harder are the same sites that would now
emit page allocation failure warnings. These are rare, and the only
times I have seen them is under enough load that latency is shot to
hell anyway. So I'm not really convinced by the regression argument.

But that would *actually* clean up the flags, not make them even more
confusing:

Allocations that can't ever handle failure would use __GFP_NOFAIL.

Callers like XFS would use __GFP_MAYFAIL specifically to disable the
implicit __GFP_NOFAIL of !costly allocations.

Callers that would prefer falling back over killing and looping would
use __GFP_NORETRY.

Wouldn't that cover all usecases and be much more intuitive, both in
the default behavior as well as in the names of the flags?