Re: RFC for 2.6: avoid OOM at bounce buffer storm

From: Andrew Morton
Date: Wed Jun 08 2005 - 16:49:44 EST


Martin Wilck <martin.wilck@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello Andrew,
>
> > The semaphore is initialised with the limit level, so once it has been
> > down()ed more than `limit' times, processes will block until someone does
> > up().
>
> Oh - of course. Neat.
>
> >>It appears to run much more
> >> smoothly now, perhaps because wakeup_bdflush() isn't called any more.
> >> Are you still interested in more data?
> >
> > Perhaps the newer kernel has writeback thresholding fixes so it's not
> > possible to dirty as much memory with write().
>
> I have collected more data and the behavior with 2.6.12-rc5-mm2 is
> flawless, there is a continuous writeback flow close to the maximum rate
> possible, and the bounce buffer usage never gets anywhere near the limit
> where it'd become dangerous. At least not in my test setup. The latest
> fedora kernel 2.6.11-1.27 also behaves ok, although it doesn't adapt to
> changing io load as smoothly as 2.6.12-rc5-mm2 does, and the writeback
> rate is oscillating more strongly.
>
> The kernels where I observe the problem are 2.6.9 kernels from RedHat
> EL4. I have posted this here because I saw that the highmem bounce
> buffer/memory pool implementation was identical between the 2.6.9 kernel
> and all but the very latest development kernels, and I concluded
> prematurely that the behavior under my scenario must also be the same --
> which it wasn't. I apologize for not having looked more closely.
>
> Many thanks for looking into this anyway. From a theoretical point of
> view, I still think I had a valid point :-/.
>
> Your patch sure looks good to me.

Well. As I said, I think what you're seeing here is recent changes to
mm/page-writeback.c which reduce the amount of memory which we'll permit to
be dirtied due to write() calls. You'll probably find that the bounce
buffer problem is also fixable by reducing /proc/sys/vm/dirty_ratio in
2.6.9, for the same reasons.

What concerns me is that there are other ways of dirtying lots of memory
apart from write(): namely mmap(MAP_SHARED). If someone dirties 90% of all
memory via mmap() then we might again get into bounce buffer starvation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/