Re: [RFC] mm: Allow ZONE_DMA32 to be disabled via kernel command line

From: Chris Goldsworthy
Date: Tue Jan 31 2023 - 23:10:05 EST


On Fri, Jan 27, 2023 at 04:55:53PM +0800, Hillf Danton wrote:
> On Thu, 26 Jan 2023 18:20:26 -0800 Chris Goldsworthy <quic_cgoldswo@xxxxxxxxxxx>
> > On Thu, Jan 26, 2023 at 07:15:26PM +0000, Robin Murphy wrote:
> > > However, I'm just going to take a step back and read the commit message a
> > > few more times... Given what it claims, I can't help but ask why wouldn't we
> > > want a parameter to control kswapd's behaviour and address that issue
> > > directly, rather than a massive hammer that breaks everyone allocating
> > > explicitly or implicitly with __GFP_DMA32 (especially on systems where it
> > > doesn't normally matter because all memory is below 4GB anyway), just to
> > > achieve one rather niche side-effect?
> > >
> > > Thanks,
> > > Robin.
> >
> > Hi Robin,
> >
> > The commit text doesn't spell out the scenario we want to avoid, so I
> > will do that for clarity. We use a kernel binary compiled for us, and
> > by default has CONFIG_ZONE_DMA32 (and it can't be disabled for now as
> > another party needs it). Our higher-end SoCs are usually used with
> > 8-12 GB of DDR, so using a 12 GB device as an example, we would have 8
> > GB of ZONE_NORMAL memory and 4 GB of ZONE_MOVABLE memory with the
> > feature, and 4 GB of ZONE_DMA32, 4 GB of ZONE_NORMAL and 4 GB of
> > ZONE_MOVABLE otherwise.
> >
> > Without the feature enabled, consider a GFP_KERNEL allocation that
> > causes a low watermark beach in ZONE_NORMAL, such that such that
> > ZONE_DMA32 is almost full. This will cause kswapd to start reclaiming
> > memory, despite the fact that that we might have gigabytes of free
> > memory in ZONE_DMA32 that can be used by anyone (since GFP_MOVABLE and
> > GFP_NORMAL can fall back to using ZONE_DMA32).
>
> If kswapd is busy reclaiming pages even given gigabytes of free memory
> in the DMA32 zone then it is a CPU hog.
>
> Feel free to check pgdat_balanced() and prepare_kswapd_sleep().

Thanks for pointing out this gap in my understanding - I'm taking a closer look
at these paths to see whether there is room for what Robin suggested.