2.6.24: RPC: bad TCP reclen 0x00020090 (large)

From: Michael Tokarev
Date: Wed Feb 13 2008 - 09:03:11 EST


Hello!

After upgrading to 2.6.24 (from .23), we're seeing ALOT
of messages like in $subj in dmesg:

Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
...

with linux NFS server. The clients are all linux too, mostly 2.6.23
and some 2.6.22.

I found the "offending" piece of code in net/sunrpc/svcsock.c,
in routine svc_tcp_recvfrom() with condition being:

if (svsk->sk_reclen > serv->sv_max_mesg) ...

This happens after a server reboot. At this point, client(s) are trying
to perform some NFS transaction and fail, and server starts generating
the above messages - till I do a umount followed by mount on all clients.
Before, such situation (nfs server reboot) were handled transparently,
ie, there was nothing to do, the mount continued working just fine when
the server comes back online.

Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
problem. Some time ago we also upgraded nfs-kernel-server (Debian)
package, and the remount-after-nfs-server-reboot problem started to
occur at THAT time (and it is something to worry about as well, I just
had no time to deal with it); but the dmesg spamming only appeared
with 2.6.24.

How to debug the issue further on from this point?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/