Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

From: Vlastimil Babka
Date: Tue Apr 11 2017 - 15:00:26 EST


+CC linux-api

On 11.4.2017 19:24, Christoph Lameter wrote:
> On Tue, 11 Apr 2017, Vlastimil Babka wrote:
>
>> The root of the problem is that the cpuset's mems_allowed and mempolicy's
>> nodemask can temporarily have no intersection, thus get_page_from_freelist()
>> cannot find any usable zone. The current semantic for empty intersection is to
>> ignore mempolicy's nodemask and honour cpuset restrictions. This is checked in
>> node_zonelist(), but the racy update can happen after we already passed the
>
> The fallback was only intended for a cpuset on which boundaries are not enforced
> in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL)
> should fail the allocation.

Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on
the basis of cpuset having higher priority, while you seem to be talking about
ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if
required nodemask contains no nodes that are allowed by the process's current
cpuset context, the memory policy reverts to local allocation" which does come
down to ignoring mempolicy's nodemask.

>> This patch fixes the issue by having __alloc_pages_slowpath() check for empty
>> intersection of cpuset and ac->nodemask before OOM or allocation failure. If
>> it's indeed empty, the nodemask is ignored and allocation retried, which mimics
>> node_zonelist(). This works fine, because almost all callers of
>
> Well that would need to be subject to the hardwall flag. Allocation needs
> to fail for a hardwall cpuset.

They still do, if no hardwall cpuset node can satisfy the allocation with
mempolicy ignored.