Re: [RFC] Make balance_dirty_pages zone aware (1/2)

From: Martin J. Bligh
Date: Mon Nov 24 2003 - 18:47:37 EST


>> >> Currently the VM decides to start doing background writeback of pages if
>> >> 10% of the systems pages are dirty, and starts doing synchronous
>> >> writeback of pages if 40% are dirty. This is great for smaller memory
>> >> systems, but in larger memory systems (>2GB or so), a process can dirty
>> >> ALL of lowmem (ZONE_NORMAL, 896MB) without hitting the 40% dirty page
>> >> ratio needed to force the process to do writeback.
>> >
>> > Yes, it has been that way for a year or so. I was wondering if anyone
>> > would hit any problems in practice. Have you hit any problem in practice?
>> >
>> > I agree that the per-zonification of this part of the VM/VFS makes some
>> > sense, although not _complete_ sense, because as you've seen, we need to
>> > perform writeout against all zones' pages if _any_ zone exceeds dirty
>> > limits. This could do nasty things on a 1G highmem machine, due to the
>> > tiny highmem zone. So maybe that zone should not trigger writeback.
>> >
>> > However the simplest fix is of course to decrease the default value of the
>> > dirty thresholds - put them back to the 2.4 levels. It all depends upon
>> > the nature of the problems which you have been observing?
>>
>> I'm not sure that'll fix the problem for NUMA boxes, which is where we
>> started.
>
> What problems?
>
>> When any node fills up completely with dirty pages (which would
>> only require one process doing a streaming write (eg an ftp download),
>> it seems we'll get into trouble.
>
> What trouble?

Well ... not so sure of this as I once was ... so be gentle with me ;-)
But if the system has been running for a while, memory is full of pagecache,
etc. We try to allocate from the local node, fail, and fall back to the
other nodes, which are all full as well. Then we wake up kswapd, but all
pages in this node are dirty, so we block for ages on writeout, making
mem allocate really latent and slow (which was presumably what
balance_dirty_pages was there to solve in the first place).

> If we make the dirty threshold a proportion of the initial amount of free
> memory in ZONE_NORMAL, as is done in 2.4 it will not be possible to fill
> any node with dirty pages.

True. But that seems a bit extreme for a system with 64GB of RAM, and only
896Mb in ZONE_NORMAL ;-) Doesn't really seem like the right way to fix it.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/