2.613: network write socket problems

From: Bernd Schubert
Date: Mon Sep 12 2005 - 10:40:16 EST


Hello,

on last Friday we switched on our server to 2.6.13 and today we are
experiencing problems with our nfs clients.
In particular I'm talking about the unfs3 daemon, not the kernel nfs daemon.
Both are running on the server but on different ports, of course. Both are
also serving to the same clients, but different directories.

Today it already several times happend that the unfs3 daemon stalled. Ethereal
showed no network packages on the unfs3 daemon port during this time.
A strace to the proc-id of the daemon clearly shows that *some* writes to some
network sockets will take ages to finish

write(37, "\200\0\0x\203\326(\5\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 124) =
124

This kind of writes can take between seconds and minutes, while it usually
happens much faster than I can count. After the write() to the network
socket, other operations happen rather fast, until the next write to a
network socket. (I identified the troublesome filedescriptors by looking
to /proc/procid/fd).
After restarting the unfs3 daemon everything goes smooth for some time
(approximately 20min to 2h), until the next write to a filedescriptor stalls.

Any idea whats going on? Until today this never happend before, neither with
2.6.x nor 2.4.x. As I wrote, on Friday we replaced 2.6.11.12 by 2.6.13, the
configuration should be similar, only changes should be HZ set to 250 and
additionally the skge driver.
We already switched back from skge to sk98lin, but the problem seems to
remain.


Thanks,
Bernd



--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@xxxxxxxxxxxxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/