Re: [PATCH 11/12] mm: accurate pageout congestion wait

From: Andrew Morton
Date: Thu Apr 05 2007 - 19:18:42 EST


On Thu, 05 Apr 2007 19:42:20 +0200
root@xxxxxxxxxxxxxxxxxxxxxxxxx wrote:

> Only do the congestion wait when we actually encountered congestion.

The name congestion_wait() was accurate back in 2002, but it isn't accurate
any more, and you got misled. It does not only wait for a queue to become
uncongested.

See clear_bdi_congested()'s callers. As long as the queue is in an
uncongested state, we deliver wakeups to congestion_wait() blockers on
every IO completion. As I said before, it is so that the MM's polling
operations poll at a higher frequency when the IO system is working faster.
(It is also to synchronise with end_page_writeback()'s feeding of clean
pages to us via rotate_reclaimable_page()).



Page reclaim can get into trouble without any request queue having entered
a congested state. For example, think about a machine which has a single
disk, and the operator has increased that disk's request queue size to
100,000. With your patch all the VM's throttling would be bypassed and we
go into a busy loop and declare OOM instantly.

There are probably other situations in which page reclaim gets into trouble
without a request queue being congested.

Minor point: bdi_congested() can be arbitrarily expensive - for DM stackups
it is roughly proportional to the number of subdevices in the device. We
need to be careful about how frequently we call it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/