Re: Very bad swap bug -- 2.0, 2.1 at least

Simon Kirby (sim@netnation.com)
Wed, 16 Sep 1998 12:09:02 -0700 (PDT)


On Wed, 16 Sep 1998, Rik van Riel wrote:

> On Tue, 15 Sep 1998, Simon Kirby wrote:
>
> > This swap bug that I mentioned a while back is still happening, and this
> > time seems to be much worse than before. In this particular case it is
> > happening on a medium-loaded web server running 2.0.35.
> [SNIP]
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 1 0 0 5376 18236 10200 52244 380 0 174 0 292 305 2 9 89
> [SNIP 13 lines]
> > 0 0 0 5376 16028 10200 53596 264 0 125 0 275 471 13 9 79
>
> Strange... There is more free memory than there is stuff to
> swap in. This does indeed suggest a bug, unless of course
> you use perl for web forms (4 simultaneous forks which glob
> the machine will have left some 16 megs.

Some large perl script is probably what cleared the 16MB free. Still, it
wasn't executing when I was running the vmstat (I was also running a
"top"...).

> OTOH, if you have the values for freepages (/proc/sys/vm/freepages)
> set to a rediculously high value, the swapping and paging might be
> for real. Setting the values in this file to something like
> "128 256 384" will leave you with some 1.5 MB of free memory
> (which is a must when heavy forking is going on).

"For real"? I'm very sure it is actually reading from disk -- I can hear
the drive working every second, and if I run my "stail"
(continously-updating cat) program on the /proc/scsi/aic7xxx/0 file, I can
see the disk being read as well.

[sroot@elmo:/root]# cat /proc/sys/vm/freepages
280 420 560

This is high, I know...should I try it lower?

> > As you can see, the amount of stuff swapped out is staying the same, but
> > it is constantly "swapping in" something. There is obviously a lot of
> > memory available to have this data swapped back in and out of swap space,
> > but it isn't happening.
>
> The MM code in 2.0 was mostly introduced in the 1.2 (!)
> days and is very well tested. Apart from the freepages
> thingy or some hardware fault (very improbably, but then
> again you're the only one to ever report this) I can't
> think of many other errors.
>
> Does the error also occur with a different kernel? Or
> with a kernel compiled with a different compiler?

All kernels compiled with 2.7.2.3. It seems to be happening on 2.0.33 and
2.0.55, and even on 2.1.120. I will try to write a C program that will
reproduce it.

> > >From the tests I did before, it seems what is being "swapped in" is the
> > executable code of a program which has been previously swapped out. I had
>
> Executable code is mmap()ed from the executable file, and is
> not swapped in. Faulting pages from executables are shown with
> the PageIn (pi) stats...

"pi"? This does not appear to be in vmstats...Where is it?

> > Issuing a "swapoff -a ; swapon -a" causes the program to go away, until
> > something else forces some programs out to swap. (In this case, it seems
> > to be some customer's run-away CGI script).
>
> Running those is always a bad idea, but the system should be
> resistant against that...
>
> You might want to try 2.1.117 (running stably here) if
> nobody has had very bad experiences with that version...
> Or maybe the 2.0.36 prepatch (that one should be stable
> too).

I believe 2.1.117 was also having the same problem. We're using 2.1 on a
web statistics compilation server due to how much faster it runs, and I
also noticed the swap thing happening on it. I believe this was with
2.1.117 when I noticed it for the first time.

Thanks for your help,

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/