I have the following problem. I am running a database server at a
customer’s site on an HP machine running linux 2.0.36. Every 3 days or
so the server hangs and if I try to kill
it with kill –11 to get a core to find out where it is, the server is
killed but no core is produced. If I kill it while it is running
normally I get a core. I set up a script to monitor what the server was
doing and what was happening on the network and I found that the server
is hanging in the system call "sys_ip_options_" or something like this.
Here is the
output from "ps –l" showing the state of the server after it hangs:
100140 0 214 1 0 0 36068 33508 ip_options_ S ? 87:56
./server
At the same time I monitored what was happening on the network with
netstat. So
that you can understand the output here is the script that I used to do
the monitoring.
#/bin/sh
while [ 1 ]
do
echo "--------------------" >> watchit.log
date >> watchit.log
netstat > netstat.log2
ps x | grep -v "ps" | grep -v "./server -b" > ps.log2
ps hlp 214 213 >> watchit.log
diff netstat.log netstat.log2 >> watchit.log
mv netstat.log2 netstat.log
diff ps.log ps.log2 >> watchit.log
mv ps.log2 ps.log
sleep 60
done
Here is the section of the log that is of interest, the server is
published on tcp port 50375.
Notice that the send queue is growing up until the server gets hung in
the system call. I am not sure what the server is trying to do at this
point since I don’t get a core but it
may be trying to close the connection.
Any Linux internal experts out there have any idea of what has happened?
It looks like a linux bug to me.
--------------------
Sat May 22 00:45:05 MEST 1999
100 0 213 1 0 0 1456 388 schedule S ? 0:03
./ticker
100140 0 214 1 0 0 36068 33508 select S ? 87:56
./server
24c24
< tcp 0 128 susi.mosler.de:50375 10.249.158.177:1063
ESTABLISHED
--- > tcp 0 132 susi.mosler.de:50375 10.249.158.177:1063 ESTABLISHED -------------------- Sat May 22 00:46:05 MEST 1999 100 0 213 1 0 0 1456 388 schedule S ? 0:03 ./ticker 100140 0 214 1 1 0 36068 33508 select S ? 87:56 ./server 24c24 < tcp 0 132 susi.mosler.de:50375 10.249.158.177:1063 ESTABLISHED--- > tcp 0 136 susi.mosler.de:50375 10.249.158.177:1063 ESTABLISHED -------------------- Sat May 22 00:47:05 MEST 1999 100 0 213 1 0 0 1456 388 schedule S ? 0:03 ./ticker 100140 0 214 1 1 0 36068 33508 select S ? 87:56 ./server 24c24 < tcp 0 136 susi.mosler.de:50375 10.249.158.177:1063 ESTABLISHED--- > tcp 0 140 susi.mosler.de:50375 10.249.158.177:1063 ESTABLISHED 40a41 > 3049 ? R 0:00 -bash -------------------- Sat May 22 00:48:06 MEST 1999 100 0 213 1 0 0 1456 388 schedule S ? 0:03 ./ticker 100140 0 214 1 0 0 36068 33508 ip_options_ S ? 87:56 ./server 7c7 < tcp 0 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED--- > tcp 76 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED 13c13 < tcp 0 0 susi.mosler:netbios-ssn 10.249.158.214:1171 ESTABLISHED--- > tcp 0 4 susi.mosler:netbios-ssn 10.249.158.214:1171 ESTABLISHED 41d40 < 3049 ? R 0:00 -bash -------------------- Sat May 22 00:49:06 MEST 1999 100 0 213 1 0 0 1456 388 schedule S ? 0:03 ./ticker 100140 0 214 1 0 0 36068 33508 ip_options_ S ? 87:56 ./server 7c7 < tcp 76 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED--- > tcp 84 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED 13c13 < tcp 0 4 susi.mosler:netbios-ssn 10.249.158.214:1171 ESTABLISHED--- > tcp 0 0 susi.mosler:netbios-ssn 10.249.158.214:1171 ESTABLISHED -------------------- Sat May 22 00:50:06 MEST 1999 100 0 213 1 0 0 1456 388 schedule S ? 0:03 ./ticker 100140 0 214 1 0 0 36068 33508 ip_options_ S ? 87:56 ./server 7c7 < tcp 84 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED--- > tcp 92 0 susi.mosler.de:50375 susi.mosler.de:1111 ESTABLISHED 12c12 < tcp 0 0 susi.mosler:netbios-ssn 10.249.158.179:1074 ESTABLISHED--- > tcp 0 4 susi.mosler:netbios-ssn 10.249.158.179:1074 ESTABLISHED 40a41 > 3084 ? R 0:00 -bash
-- Barry Leslie - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.rutgers.edu