Re: [PATCH] mm: page_alloc: don't allocate page from memoryless nodes

From: Mike Rapoport
Date: Wed Feb 15 2023 - 04:30:43 EST


On Tue, Feb 14, 2023 at 02:38:44PM +0100, Michal Hocko wrote:
> On Tue 14-02-23 12:58:39, David Hildenbrand wrote:
> > On 14.02.23 12:48, David Hildenbrand wrote:
> > > On 14.02.23 12:44, Mike Rapoport wrote:
> > > > (added x86 folks)
> > > >
> > > > On Tue, Feb 14, 2023 at 12:29:42PM +0100, David Hildenbrand wrote:
> > > > > On 14.02.23 12:26, Qi Zheng wrote:
> > > > > > On 2023/2/14 19:22, David Hildenbrand wrote:
> > > > > > >
> > > > > > > TBH, this is the first time I hear of NODE_MIN_SIZE and it seems to be a
> > > > > > > pretty x86 specific thing.
> > > > > > >
> > > > > > > Are we sure we want to get NODE_MIN_SIZE involved?
> > > > > >
> > > > > > Maybe add an arch_xxx() to handle it?
> > > > >
> > > > > I still haven't figured out what we want to achieve with NODE_MIN_SIZE at
> > > > > all. It smells like an arch-specific hack looking at
> > > > >
> > > > > "Don't confuse VM with a node that doesn't have the minimum amount of
> > > > > memory"
> > > > >
> > > > > Why shouldn't mm-core deal with that?
> > > >
> > > > Well, a node with <4M RAM is not very useful and bears all the overhead of
> > > > an extra live node.
> > >
> > > And totally not with 4.1M, haha.
> > >
> > > I really like the "Might fix boot" in the commit description.
> > >
> > > >
> > > > But, hey, why won't we just drop that '< NODE_MIN_SIZE' and let people with
> > > > weird HW configurations just live with this?
> > >
> > >
> > > ;)
> > >
> >
> > Actually, remembering 09f49dca570a ("mm: handle uninitialized numa nodes
> > gracefully"), this might be the right thing to do. That commit assumes that
> > all offline nodes would get the pgdat allocated in free_area_init(). So that
> > we end up with an allocated pgdat for all possible nodes. The reasoning IIRC
> > was that we don't care about wasting memory in weird VM setups.
>
> Yes, that is the case indeed. I suspect the NODE_MIN_SIZE is a relict of
> the past when some PXM entries were incorrect or fishy. I would just
> drop the check and see whether something breaks. Or make those involved
> back then remember whether this is addressing something that is relevant
> these days. Even 5MB node makes (as the memmap is allocated for the
> whole memory section anyway and that is 128MB) a very little sense if you ask me.

How about we try this: