Re: [PATCH 1/2] mm: page allocator: Initialise ZLC for first zoneeligible for zone_reclaim

From: Christoph Lameter
Date: Thu Jul 21 2011 - 11:25:01 EST

On Wed, 20 Jul 2011, Mel Gorman wrote:

> On Wed, Jul 20, 2011 at 04:17:41PM -0500, Christoph Lameter wrote:
> > Hmmm... Maybe we can bypass the checks?
> >
> Maybe we should not.
> Watermarks should not just be ignored. They prevent the system
> deadlocking due to an inability allocate a page needed to free more
> memory. This patch allows allocations that are not high priority
> or atomic to succeed when the buddy lists are at the min watermark
> and would normally be throttled. Minimally, this patch increasing
> the risk of the locking up due to memory expiration. For example,
> a GFP_ATOMIC allocation can refill the per-cpu list with the pages
> then consumed by GFP_KERNEL allocations, next GFP_ATOMIC allocation
> refills again, gets consumed etc. It's even worse if it's PF_MEMALLOC
> allocations that are refilling the lists as they ignore watermarks.
> If this is happening on enough CPUs, it will cause trouble.

Hmmm... True. This allocation complexity prevents effective use of caches.

> At the very least, the performance benefit of such a change should
> be illustrated. Even if it's faster (and I'd expect it to be,
> watermark checks particularly at low memory are expensive), it may
> just mean the system occasionally runs very fast into a wall. Hence,
> the patch should be accompanied with tests showing that even under
> very high stress for a long period of time that it does not lock up
> and the changelog should include a *very* convincing description
> on why PF_MEMALLOC refilling the per-cpu lists to be consumed by
> low-priority users is not a problem.

The performance of the page allocator is extremely bad at this point and
it is so because of all these checks in the critical paths. There have
been numerous ways that subsystems worked around this in the past and I
would think that there is no question that removing expensive checks from
the fastpath improves performance.

Maybe the only solution is to build a consistent second layer of
caching around the page allocator that is usable by various subsystems?

SLAB has in the past provided such a caching layer. The problem is that
people are trying to build similar complexity into the fast path of those
allocators as well now (f.e. the NFS swap patch with its ways of reserving
objects to fix the issue of objects being taken for the wrong reasons that
you mentioned above). We need some solution that allows the implementation of
fast object allocation and that means reducing the complexity of what is
going on during page alloc and free.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at