Re: [RFC 1/4] mm, page_alloc: fix check for NULL preferred_zone

From: Michal Hocko
Date: Wed Jan 18 2017 - 05:14:44 EST


On Wed 18-01-17 10:45:33, Vlastimil Babka wrote:
> On 01/18/2017 10:31 AM, Michal Hocko wrote:
> > On Tue 17-01-17 23:16:07, Vlastimil Babka wrote:
> > > Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in
> > > a zonelist twice") we have a wrong check for NULL preferred_zone, which can
> > > theoretically happen due to concurrent cpuset modification. We check the
> > > zoneref pointer which is never NULL and we should check the zone pointer.
> > >
> > > Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
> > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> > > ---
> > > mm/page_alloc.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 34ada718ef47..593a11d8bc6b 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3763,7 +3763,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > > */
> > > ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
> > > ac.high_zoneidx, ac.nodemask);
> > > - if (!ac.preferred_zoneref) {
> > > + if (!ac.preferred_zoneref->zone) {
> >
> > When can the ->zone be NULL?
>
> Either we get a genuinely screwed nodemask, or there's a concurrent cpuset
> update and nodes in zonelist are ordered in such a way that we see all of
> them as not being available to us in the nodemask/current->mems_alowed, when
> we iterate the zonelist, so we reach the end of zonelist. The zonelists are
> terminated with a zoneref with NULL zone pointer.

Thanks for the clarification. Please add a big fat comment in
first_zones_zonelist about this potential case.

--
Michal Hocko
SUSE Labs