OOM?

From: Jason Gunthorpe (jgg@ualberta.ca)
Date: Mon Jun 26 2000 - 12:24:00 EST


Hi all,

Are there any known bugs with the 2.14->16 OOM handlers? I have just had
another box (stock 2.2.16) blow itself away by running out of memory :<

What happens is some script/program has a bug and manages to consume
memory in a loop, fairly slowly over a time. Today it was a perl script, a
few months ago it was wml and I think I've seen rsync+ftpds do it too..

The apparent end result is either the box just seizes up, still
responding to pings and what not, but unable to do anything (even at the
console). There are no log messages, I presume klogd was blown away by the
OOM handler before it could log anything?

[Aside: wml has OOM'd the box twice about 24 hours apart a few months
ago. When this happened the first time had no log messages, the second
time there was a few VM: killing foo messages - hence my theory that klogd
dies early on..]

Through carefull observation I have confirmed that two of these instances
are OOM, the other has not happened again.

The third instance, that I think is simply lots of ftpd+rsyncs's was very
strange, all services on the box were killed, cron, logging, inetd, ssh,
etc - but apache continued to run. I could browse the files it was serving
no problem - this is why I suspect it was OOM, but cannot be sure.

Frankly, this sucks. I would much rather the kernel panic and reboot
itself than die like this, it takes days for someone to get on site and
restart the box by hand :<

Thanks,
Jason

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jun 26 2000 - 21:00:09 EST