But I probably soon will. I've been having problems for months with one
of my servers which has been crashing regular as clockwork every five
days. I have recently narrowed the suspects down to a kernel or other
memory leak of about 10MB a day (64MB machine).
Because I know that nobody will take any notice if I say that this is
one of my carefully nurtured and patched 2.0.25 machines, I took it on
myself to upgrade it over the weekend to 2.0.33. At the moment the
result looks worse. I have a distinct feeling from the memory map that
this thing isn't going to make it to morning. Let's wait and see ...
This is therefore a request for comment. Has anybody seen anything like
this? I have 100 other machines with identical (as near as possible)
kernels and no such symptoms. I'll post the ps auxwmm output at the
end. Heres' the raw data after a couple days uptime:
total used free shared buffers cached
Mem: 63060 62016 1044 6592 716 3008
-/+ buffers: 58292 4768
Swap: 144544 11456 133088
1:24am up 2 days, 39 min, 1 user, load average: 0.00, 0.00, 0.02
Linux arpa 2.0.33 #12 Mon Jul 6 00:40:07 MET DST 1998 i586
Ugh. I think there are about 75 processes going. I chucked everyone
off, switched run levels a few times, and so on. I relaunched every
demon I could. At the moment this looks like what my 2.0.25 kernels did
before dying. I had the system down to 8 processes and me, and still
60MB of memory used.
So it looks like a leak to me. But I have tons of other kernels that
aren't doing this. All kernels are compiled from the same (evolving)
source set, at different times. What's unique about this machine and in
common between the 2.0.25 kernel and the 2.0.33 one:
Both kernels compiled with gcc 2.8.0 (so are 50 others, all fine)
Buslogic scsi card driving two internal common garden scsi disks of 4G.
Both kernels patched with e2compr 0.3.7 (0.3.2 in the case of the 2.0.25)
Aprox. same old 3c509b driver module.
Minix and ext2 built in.
Bridging built in.
Also aic7xxx driver built in, for sharing the kernel with aha machines.
Same microstar vx motherboard and classic P100.
Teardrop and other fixes for the 2.0.25 kernel - those come with 2.0.33.
Carrying two ciped tunnels.
Running squid (early but reliable and so are many other machines).
Running apache.
Split equal priority 72+72M swap space across the two disks.
Running X 3.3.1 under xdm (so are plenty ...) lately with xfs
Running ypserv and ypbind and ypxfr, etc.
Running named (old bind).
Heavy nfs loading. Exporting 10 ways. Importing 10 systems at least.
Busy net .. about 4% collisions.
Running bootpd, dhcpd, popd, etc.
Doing about 2-500MB of net transfers (incremental backups) every morning.
Running an attempted smbmount or two every day.
I suspect a network related leak triggered by 2.8.0. I may be completely
wrong. But yesterday I transfered 4GB off this machine by net and today
it's memory use is too high.
Suggestions welcomed. Here is the ps auxwm output.
2 ? 0 0 0 0 0 0 0 0 0 kflushd
3 ? 25502 0 0 0 0 0 0 0 0 kswapd
7478 3 205 23 0 0 60 60 0 548 0 agetty
7479 4 199 23 0 0 60 60 0 548 0 agetty
7480 5 199 23 0 0 60 60 0 548 0 agetty
143 ? 193 43 0 0 0 0 0 0 0 nfsiod
144 ? 193 43 0 0 0 0 0 0 0 nfsiod
145 ? 193 43 0 0 0 0 0 0 0 nfsiod
146 ? 193 43 0 0 0 0 0 0 0 nfsiod
7481 6 199 23 0 0 60 60 0 548 0 agetty
159 ? 77 43 0 0 64 64 0 548 0 lpd
29844 1 221 22 0 0 60 60 0 548 0 agetty
7517 ? 238 59 0 0 96 96 0 548 0 dnsserver
319 ? 69 13 0 0 56 56 0 548 0 rrd
321 ? 74 15 0 0 68 68 0 548 0 rpc.rdfd
7518 ? 208 36 0 0 80 80 0 548 0 dnsserver
7519 ? 71 13 0 0 60 60 0 548 0 dnsserver
7520 ? 71 13 0 0 60 60 0 548 0 dnsserver
7521 ? 71 13 0 0 60 60 0 548 0 dnsserver
7522 ? 71 13 0 0 60 60 0 548 0 dnsserver
7523 ? 71 13 0 0 60 60 0 548 0 dnsserver
7524 ? 71 13 0 0 60 60 0 548 0 dnsserver
7525 ? 83 21 0 0 80 80 0 548 0 ftpget
15923 2 190 22 0 0 60 60 0 548 0 agetty
14527 p4 424 801 0 0 324 324 0 556 0 tcsh
313 a1 33 7 12 12 88 64 16 548 2 /usr/sbin/powerd /etc/powerd.conf
214 ? 937 1897 0 32 120 88 32 548 5 sshd
11 ? 71 3 8 28 76 40 24 548 3 /sbin/update
6 ? 154 244 12 32 92 48 28 548 2 /sbin/kerneld
216 ? 121 34810 8 32 96 56 28 548 3 /usr/sbin/watchdog
11242 p1 126 31 0 40 148 108 40 556 0 vi
246 ? 299 20 24 20 104 60 32 548 3 /sbin/ciped -o /etc/cipe/options-0
265 ? 285 27 24 20 104 60 32 548 3 /sbin/ciped -o /etc/cipe/options-1
87 ? 1188 895 12 32 3568 3524 28 576 4 codasrv -trunc 5 -debarrenize -nosalvageonshutdown -nodumpvm -rvm /var/log/coda.log /var/spool
323 ? 526 122 24 24 428 380 24 576 3 /usr/X11R6/bin/xfs -config /etc/XF86FSConfig
1 ? 607 452 20 36 112 56 40 548 3 init [3]
153 ? 262 5057 12 48 128 68 36 548 6 /usr/sbin/inetd
5032 ? 454 165 12 52 228 164 40 548 6 rpc.mountd
7627 ? 157 29 16 68 108 24 48 548 13 ypbind.old
7628 ? 151 23 12 112 176 52 96 548 11 ypbind.old
14524 ? 384 225 60 76 516 380 88 548 15 /usr/sbin/sshd
318 ? 606 20658 24 116 328 188 100 632 10 /usr/sbin/snmpd -f
19414 ? 543 40 28 128 208 52 116 548 7 /usr/sbin/syslogd
19389 ? 400 399 12 148 188 28 124 548 7 /usr/sbin/rwhod
19426 ? 591 3012 16 156 196 24 120 548 13 /usr/sbin/crond
343 ? 734 40 32 160 232 40 136 548 6 in.bootpd -d 7
19407 ? 659 3586 48 156 356 152 160 624 11 sendmail: accepting connections on port 25
151 ? 798 22796 28 184 236 24 152 548 14 /usr/sbin/rpc.portmap
309 ? 1629 19569 8 204 248 36 136 548 19 /usr/local/sbin/tcplog
14830 ? 657 26 48 168 308 92 128 548 12 /usr/sbin/dhcpd -q
11069 ? 55 68 20 204 260 36 172 548 32 httpd
8339 ? 161 60 48 176 300 76 148 548 18 /usr/sbin/amd -a /usr -l syslog -r -c 1000 /net auto_linux
310 ? 874 10049 8 220 272 44 164 548 16 /usr/local/sbin/icmplog
5035 ? 876363 128 48 296 472 128 156 548 44 rpc.nfsd
11040 ? 100 108 124 224 536 188 224 548 36 /usr/sbin/sshd
155 ? 1658 302 44 328 424 52 144 548 10 /usr/sbin/named
12181 p1 110 24 24 360 384 0 284 592 25 ps auxwm
170 ? 921 6149 120 264 428 44 264 548 25 nmbd
11070 ? 46 73 76 324 408 8 292 548 43 httpd
11071 ? 66 99 76 356 436 4 328 548 45 httpd
11128 ? 68 115 76 396 476 4 368 548 45 httpd
12068 ? 67 32 80 420 500 0 392 548 46 httpd
11072 ? 100 89 80 444 528 4 420 548 45 httpd
11043 p1 249 2542 200 376 696 120 332 556 61 -tcsh
7516 ? 1110 739 204 904 2268 1160 352 548 171 /usr/local/share/squid/bin/squid
7539 ? 5258 11528 28 2768 4352 1556 128 548 667 rpc.ypxfrd
11144 p1 1926 5775 40 8612 9424 772 268 704 2096 ypserv-new
(I just tried launching a newer ypserv - clearly a bad idea ... )
total used free shared buffers cached
Mem: 63060 62016 1044 6656 668 2936
-/+ buffers: 58412 4648
Swap: 144544 11456 133088
I see nothing anormal in the netstat output.
Peter ptb@it.uc3m.es
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu