Re: A true story of a crash.

Joel Jaeggli (joelja@darkwing.uoregon.edu)
Sat, 15 Aug 1998 22:09:30 -0700 (PDT)


On Sun, 16 Aug 1998, Albert D. Cahalan wrote:

> Matt Agler writes:
> > On Sat, 15 Aug 1998, Albert D. Cahalan wrote:
>
> >> How do you _not_ let every last page get used? The first obvious
> >> problem is overcommit, which you'd have to disable.
> >
> > No. When you hit 90% (or whatever, make it configurable) utilization of
> > swap, you set a flag. The kernel is overcommitted all over the place, but
>
> That is useful on a large enough machine. The 90% rule needs to
> be a 2 MB rule, and small machines just don't have the spare memory.
>
> I like it, but don't see it as a replacement for the process killer.
> It won't save you from daemons gone wild, many of which will be
> restarted if the kernel kills them. It also won't help a tiny machine
> or a machine with the admin on vacation.

I agree, On my work machines, if they start running out of swap it's
generally a sign that something is seriously wrong with the machine when
it's gone through 128 or 256 mb of ram and 256 or 384mb of swap radical
action that doesn't involve adding more swap is required to deal with the
issue.

As you say, When things get really out of hand, the idea should be to make
it come back as gracefully as possible not bandaid it until it tips over
for good. beyond a certain point adding more swap is just asking for
things to get slower and slower. The dynamically growing swapfile is win95
is a particularly good example of this. It will keep growing swap until
you either run out of disk space, get sick of it grinding and reboot or
kill a bunch of apps. Generally if you manage to keep a windows box
running for a couple of weeks you end up rebooting it because the swapfile
has become so huge.

> How about the following steps, in order?
>
> 1. optional SIGWARN, or whatever it is AIX sends
> 2. optional removal from scheduler
> 3. optional (highly recommended) process killer
> 4. mandatory oops, kill everything, and reboot

switching runlevels to single user and then coming back might suffice,
checking memory and scanning for, and resetting scsi devices takes a not
insignificant amount of time on a number of bigger machines.

> For the SIGWARN, glibc could return freed memory to the OS.
> (would need to be disabled for init and other critical things)
>
> The process killer alone is simple though, and already written.

--------------------------------------------------------------------------
Joel Jaeggli joelja@darkwing.uoregon.edu
Academic User Services consult@gladstone.uoregon.edu
PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html