Re: compressed swap performance issues

Paul R. Wilson (wilson@cs.utexas.edu)
Mon, 21 Dec 1998 13:38:50 -0600


>From vonbrand@pincoya.inf.utfsm.cl Mon Dec 21 06:10:39 1998
>To: "Paul R. Wilson" <wilson@cs.utexas.edu>
>cc: linux-kernel@vger.rutgers.edu
>Subject: Re: compressed swap performance issues
>Date: Mon, 21 Dec 1998 09:10:21 -0300
>From: Horst von Brand <vonbrand@inf.utfsm.cl>
>
>Just a couple of points:
>
>- This makes sense for _one_ user machines _only_. For multiuser machines,
> you have something better for the CPU to do than to compress/decompress
> pages for swapping

As I understand it, when machines start paging, they are usually I/O
bound. Surely this is the case for single-user machines, but it is
the classic situation for timeshared mainframes as well. This is
the whole reason for Working Set scheduling and all that.

Suppose we have a classic timeshared mainframe situation, with big
batch jobs in the job mix, for example. The Working Set strategy
works primarily by suspending jobs and swapping them out, to free
up memory so that other processes can run without paging much. If
this happens often enough, you'll be I/O bound just in process
swapping. More importantly, if you don't have big batch jobs,
or can't reliably tell what's a batch job, you have to start
process-swapping jobs that may in fact be interactive, or be
processes that interactive processes may depend on, causing
blocking. This has always been a problem for UNIX, and doing
mainframe-style scheduling and replacement (e.g., Working Set)
introduces a lot of complexity and administrative hassle.

The hidden scandal of Working Set *replacement* policies is that
they basically don't work right. The effectiveness of Working
Set depends mostly on process suspension, which is a brute-force
way of avoiding thrashing. In a modern computing environment,
this is usually a lose because you either end up having to
suspend jobs you can't safely suspend, or you have large
VM footprints that you have to swap in and out. You'lll end up
I/O bound just in the process swapping.

>- As you say, most Linux users (even personal machines) don't use swap very
> much, they have got the RAM

I'm not so sure about this. My machines page, and I'm sure many
other people's do, too---witness the recent jubilation about
the effectiveness of prefetching. The trend seems to be that "normal"
jobs don't page most of the time, but you lose on big jobs doing
interesting real work, and on re-activation of applications that have
been idle. Ideally, compressed VM would be great for such situations;
you'd be able to handle bigger jobs without hitting the disk at all.

Many PC users machines page when they run some app like Photoshop.
There's usually a binary distinction between programs that page and
programs that don't, and people have to buy enough RAM for the
programs that page to keep them from paging horribly. Most of the
time, that extra RAM is wasted.

>- With today's RAM and disk prices, the diskless machines that used to swap
> over the net are long gone history

I don't know about this. In principle, I think future computing situations
should evolve toward two situations: networked computers with no
disks at all, paging off a server, and laptops where the disks are optimized
for capacity rather than speed. (Fast disks are a lose on a laptop because
they eat batteries. With compressed VM, you would be able to leave the
disk spun down more often.)

As powerful CPUs, RAM, and disk all get cheaper, eventually a pretty
good PC will be only a very few hundred dollars. At that point, more
and more people will switch to PDAs and laptops, and the same problems
rear their ugly heads. The VM problem won't go away.

For multiuser systems---a big part of Linux's loyal base, compressed
paging is increasingly attractive. Consider a system with a bunch
of users logged in, or a bunch of server processes and/or threads.
Much of the time, most of the server processes and/or threads are
likely to be idle, due to user think time or app switching on
the client end. Compressing the idle processes' pages can be
done whenever the system is disk-bound, and not when it's not.
You can always use a null compression algorithm in any situation
where the compression isn't worth the CPU cost.

But disk *bandwidths* only increase 20% a year, while CPU speeds
increase 60% a year. (I misspoke in my last email. The latencies
improve *less* than 20% a year---about 12%.) We've now hit the
point where compression/decompression bandwidth is higher than
disk bandwidth. Even if prefetching is nearly perfect, just the
bandwidth limitations mean that CPUs pull ahead by a factor of
2 every three years. Within a few years, compression will be
so cheap that using it just to compress disk I/O will be a
serious win, even for perfect prefetching. But, as Rik noted,
prefetching isn't perfect, and the big win in the short term is
in decreasing latency. Compression can do that even better.

When we start looking at trends in parallel disks and parallel
processing, the situation is less clear, but I'd bet that compressed
caching is an especially good idea. RAIDs can improve bandwidth,
but they can't improve latency. Compressed caching can. I'm
betting that there will be a faster increase in total CPU speed
than in total disk bandwidth, so that even on bandwidth grounds,
compressed caching is the way to go. With SMP's, you can use
any idle processor to compress pages in the background, or
to help decompress. (Basically, whenever you fault a block
from the compression cache, you can use the idle processor(s)
to prefetch and uncompress other pages from the compression
cache simultaneously.)

In general, I think that compressed caching will be more of
a win for big iron, partly because if you have lots of processes,
you can trade one processes' memory needs off against another:
the aggregate memory pool is larger, and can absorb larger
transients from one or a few processes. If you have a large
enough RAM pool that you *usually* don't page, you'll page
very rarely, because the statistics are on your side---it's
unlikely that everybody's processes will have big footprints
at the same time.

There's also a basic scalability argument here. As memories
get bigger, the range of timescales relevant to the most
recently-touched pages vs. the least recently-touched pages
gets larger. That means that in a very large memory, most
of the pages aren't touched for several *minutes* on average.
(There's something called the "five-minute" rule in database
systems---if your cache replacement time is less than five
minutes, buy more RAM.) This means that compressing and
uncompressing the less-recently-used pages *can't* cost
very much, because you're just not touching them often enough
for it to matter a lot. So on big iron that is at least
roughly in balance on average, the cost of compression and
decompression is proportionally smaller than on a PC or
workstation.

I think the bottom line for big iron is that compression
and decompression don't have to be as much faster than disk
as they do for a PC or workstation. A factor of 2 will do,
we're clearly there unless prefetching is almost perfect.
And as long as CPU cycles get cheaper faster than disk,
the improvement available from compressed caching will
increase.

I think the current glut of cheap memory and disk is an
anomaly, and will not last. People will find ways to
suck up that memory when the processors get faster, and
the paging problem will recur. They'll do more things
in virtual memory, rather than through carefully-planned
file I/O. (In general, this should make things more
efficient, by avoiding double caching, but the VM system
has to be very good---it has to fetch and cache well,
because it can't rely on the implicit hints about
sequentiality from explicit file I/O.)

And people will buy smaller and smaller computers,
because once good desktops are less than $500, more an
more people will be willing to pay the factor of two or
so extra to get a laptop with equivalent power. Or they'll
have a bunch of network computers that boot over the network,
and may not have any sort of disk at all. For both of
those types of systems, compressed caching is clearly a
big win.

The system administrators where I work do not like PC's with
disks, and until recently would not support Linux. Some
of the major hassles are with disks---there are too many
brands, and PC's are too much of a kludge. If we could
get easily-maintained diskless machines with 128 meg of
RAM, they'd be much happier---they wouldn't have to go
to people's offices and mess around with IRQ's and all that.
With a 128-meg RAM and no disk, you could use half the
RAM as normal memory, and half as a compressed cache for
swap and for remotely-mounted filesystems. If you're
doing things that "don't page", then you'll never hit
the server except to fault in the NFS-mounted directories,
and to write back your file changes. If you're doing things
that *do* page, then you'll page to the server a lot less
often if you can effectively have 50% more memory via
compression---most of the time, when your working set
grows, it'll be absorbed by the compression cache.

In general, I don't think the need for effective caching
is ever going to go away. RAM will never be "too cheap
to meter." It's unlikely that things with moving parts
(disks) will keep pace with CPU speed---they consistently
haven't in the past, and they're not doing so right now,
and there's no reason to think they'll do so in the
future.

In the short term, I think compressed caching is generally
going to be a win in most situations, and with proper adaptation
you can detect the situation and "don't do that, then." In the
longer term, it has to be the way to go in almost every
situation.

The basic fact is that we generally need to add a level to the
memory hierarchy whenever CPU's get an order of magnitude
faster than before. Most adjacent levels of the memory hierarchy
differ in latency by a factor of two to five. But DRAM and
disk latencies differ by five _orders_of_magnitude_. This
is a crazy situation. You can't feasibly make disks much faster
than they are, and probably shouldn't try---that's not what disks
are for. Disks should be big and cheap, not small and fast. If
you want small and fast, use a compression cache. Moving parts
are ultimately a lose speed-wise, reliability-wise, and maintenance-wise.
The current designs of virtual memory systems will not survive
the trend of a doubling of CPU speed every 18 months. If that
trend tops out, you're going to see more CPUs per machine so
that aggregate CPU speed continues to outstrip anything with
moving parts.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/