Fix for wrong OOM killer trigger?

From: Roland Bless
Date: Fri Sep 19 2003 - 12:16:58 EST


Hi Miquel,

I read your e-mail http://www.cs.helsinki.fi/linux/linux-kernel/2003-27/1274.html
in the archive, but was not able to find a solution.
We have a similar problem:
HW: 4x 2,4GHz Xeon, 4GB Ram, 3ware 7000-series ATA-RAID
SW: Kernel 2.4.22 (also seen on 2.4.21, 2.4.22-ac3), lvm, software raid,
reiserfs, SuSE 8.1. Swap turned off (see later).

**Symptom: programs that heavily search the whole filesystem
(e.g., rsync, ssync, TSM backup client dsmc) cause to trigger the
OOM killer procedure (not very funny if NIS or NFS gets killed).

Sep 17 21:49:07 fs1 kernel: Out of Memory: Killed process 1384 (lmgrd).
Sep 17 21:49:12 fs1 kernel: Out of Memory: Killed process 1617 (exim).
Sep 17 21:49:18 fs1 kernel: Out of Memory: Killed process 1402 (ntpd).
Sep 17 21:49:23 fs1 kernel: Out of Memory: Killed process 1278 (portmap).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2715 (dsmc).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2716 (dsmc).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2717 (dsmc).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1600 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1601 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1603 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1604 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1605 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1606 (nscd).
Sep 17 21:49:40 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1421 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1422 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1423 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1424 (ypbind).
Sep 17 21:49:51 fs1 kernel: Out of Memory: Killed process 1584 (atd).
Sep 17 21:49:57 fs1 kernel: Out of Memory: Killed process 1329 (ypserv).

The OOM kill occured also when the cache memory didn't exhausted the
available memory (total mem usage was around 1.8GB).
echo 2>/proc/sys/vm/overcommit_memory did not solve the problem either.
In my understanding it has to do something with the fs cache/vm. We have some
files that are larger than 2GB, but usually the killing process starts at
different points in time.

However, I also saw that kswapd used a lot CPU though swap was not active.
With swap space activated, the load on the cpu increases dramatically
so that the system becomes unusable, too. This is our file server and
I'm currently not able to make a backup to other systems. That's really
frustrating.

Anyone any ideas? Please Cc: to me in your replies since I'm not on the lkml.
Cheers,
Roland

--
Roland Bless -- e-Mail: bless@xxxxxxxxx WWW: http://www.tm.uka.de/~bless
Institute of Telematics, University of Karlsruhe, Germany
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/