Re: [2.4.17/18pre] VM and swap - it's really unusable

From: M. Edward (Ed) Borasky (znmeb@aracnet.com)
Date: Sat Jan 05 2002 - 07:59:23 EST


On Sat, 5 Jan 2002, Andreas Hartmann wrote:

> I don't like special test-programs. They seldom show up the reality.
> What we need is a kernel that behaves fine in reality - not in
> testcases. And before starting the test, take care, that most of ram
> is already used for cache or buffers or applications.

OK, here's some pseduo-code for a real-world test case. I haven't had a
chance to code it up, but I'm guessing I know what it's going to do. I'd
*love* to be proved wrong :).

# build and boot a kernel with "Magic SysRq" turned on
# echo 1 > /proc/sys/kernel/sysrq
# fire up "nice --19 top" as "root"
# read "MemTotal" from /proc/meminfo

# now start the next two jobs concurrently

# write a disk file with "MemTotal" data or more in it

# perform a 2D in-place FFT of total size at least "MemTotal/2" but less
# than "MemTotal"

Watch the "top" window like a hawk. "Cached" will grow because of the
disk write and "free" will drop because the page cache is growing and
the 2D FFT is using *its* memory. Eventually the two will start
competing for the last bits of free memory. "kswapd" and "kupdated" will
start working furiously, bringing the system CPU utilization to 99+
percent. At this point the system will appear highly unresponsive.

Even with the "nice --19" setting, "top" is going to have a hard time
keeping its five-second screen updates going. You will quite possibly
end up going to the console and doing alt-sysrq-m, which dumps the
memory status on the console and into /var/log/messages. Then if you do
alt-sysrq-i, which kills everything but "init", you should be able to
log on again.

I'm going to try this on my 512 MB machine just to see what happens, but
I'd like to see what someone with a larger machine, say 4 GB, gets when
they do this. I think attempting to write a large file and do a 2D FFT
concurrently is a perfectly reasonable thing to expect an image
processing system to do in the real world. A "traditional" UNIX would do
the I/O of the file write and the compute/memory processing of the FFT
together with little or no problem. But because the 2.4 kernel insists
on keeping all those buffers around, the 2D FFT is going to have
difficulty, because it has to have its data in core.

What's worse is if the page cache gets so big that the FFT has to start
swapping. For those who aren't familiar with 2D FFTs, they take two
passes over the data. The first pass will be unit strides -- sequential
addresses. But the second pass will be large strides -- a power of two.
That second pass is going to be brutal if every page it hits has to be
swapped in!

The solution is to limit page cache size to, say, 1/4 of "MemTotal",
which I'm guessing will have a *negligible* impact on the performance of
the file write. I used to work in an image processing lab, which is
where I learned this little trick for bringing a VM to its knees, and
which is probably where the designers of other UNIX systems learned that
the memory used for buffering I/O needs to be limited :). There's
probably a VAX or two out there still that shudders when it remembers
what I did to it. :))

-- 
M. Edward Borasky

znmeb@borasky-research.net http://www.borasky-research.net

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 07 2002 - 21:00:28 EST