> I was wondering why the VAX might be so much better at running with lots
> of swap than Linux. There are a number of potential reasons, one is that
> the CPU runs at nearly the same speed as the disk. Another is that the
> actual amount of RAM a process is allowed to use is limited to the
> working set size. This causes a lot of paging, but is not a big penalty
> on a system given disk and CPU are running at near identical speeds.
>
> However, this idea could be used in an attempt to limit that actual RAM a
> process is allowed to use, but not the VM. This would limit the case
> where all available RAM is consumed by a process, preventing other
> processes from running (by forcing them to be swapped out; part of the
> current problem). Not sure if this is a kernel related or not, I don't
> know the archictecture at this level.
Amongst other things, the VAX has a "modfied page writer". It works like
this. When a process is allocated memory, the initial memory comes from
a pool of shared zero-filled pages. These pages don't actually get
owned by a specific process until a process actually writes to one.
At this time, the modified page is taken out of the pool and becomes
a permanent part of the processes' working set. However, the modified
page list resides in memory until it is necessary to copy them to disk
because real memory is getting scarce. At this time, the modified page
writer copies the oldest and least-used pages to disk. The real memory,
thus freed, is then zero-filled and put back into the pool. This process
continues.
When memory is getting real scarce, processes that are on timer-queues,
i.e., things that the kernel knows are not going to need RAM for a long
time, are swapped out entirely except for the process header. This
frees more memory.
VAX/VMS has quotas on just about everything. The maximum working-set
size, i.e., the maximum virtual pages that a process can own, is
set via AUTHORIZE. Further, SYSGEN parameters also set sizes system-
wide.
VAX/VMS was designed in the days of slow disk drives, and expensive
RAM. Therefore, much effort went into the dynamic memory allocation.
Modern operating systems generally throw hardware at the problem. If
your processes need more RAM, buy more RAM, etc. The bottom line is
that, for the best performance, an operating system should read/write
page and swap files only as a last resort. To help prevent reading and
writing to the swap device, the kernel should steal unwritten pages
from processes whenever possible. In other words, if you allocated
1 megabyte of RAM, but you haven't gotten a chance to write more than
1/2 megabyte, before you were preempted, then the kernel can steal the
1/2 megabyte until you really need it. On the ix86 with 4k pages, that's
RAM that can be used for TCP/IP buffers, etc.
Now, in Linux, one of the performance enhancing ideas is to allocate
memory, but mark the pages "missing". Any attempt to access those pages
causes a trap to the operating system. The missing page that caused
the trap is allocated by the kernel and then marked "present". This
continues for each missing page and is, in fact, one principle behind
dynamic memory allocation. However, if the kernel doesn't have a page
to allocate, something has got to give. The kernel starts trying to
free pages and, in so doing, can arrive at a condition where much of
the CPU time is devoted to reading and writing pages to the swap device.
I think that there can be some improvement by deferring such reads/writes
to see if the peak memory requirements stabililize, i.e., don't always
free up memory right away. Just wait, perhaps even seconds, before
deciding that a swap to disk is necessary. Give the CPU to tasks that
don't yet require more memory.
Under Unix, with separate tasks being created to execute simple
commands, it is quite likely that one or two transcient tasks will expire
within the next several seconds, freeing some memory.
Now, what this does is help prevent a runaway task from taking all the
system resources. If your task is a memory hog, it gets slowed down
by this allocation strategy while other tasks end up using CPU time
stolen from the memory hog.