Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss

From: Andy Chittenden
Date: Tue Aug 03 2010 - 06:25:47 EST


On 2010-08-03 10:11, Andrew Morton wrote:
(cc linux-nfs)

On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David Miller<davem@xxxxxxxxxxxxx> wrote:

From: "Andy Chittenden"<andyc.bluearc@xxxxxxxxx>
Date: Tue, 3 Aug 2010 09:14:31 +0100

I don't know whether this patch is the correct fix or not but it enables the
NFS client to recover.

Kernel version: 2.6.34.1 and 2.6.32.

Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears down
any previous shutdown attempts so that reconnects on a socket that's been
shutdown leave the socket in a usable state (otherwise tcp_sendmsg() returns
-EPIPE).

If the SunRPC code wants to close a TCP socket then use it again,
it should disconnect by doing a connect() with sa_family == AF_UNSPEC

There is code to do that in the SunRPC code in xs_abort_connection() but that's conditionally called from xs_tcp_reuse_connection():

static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct sock_xprt *transport)
{
unsigned int state = transport->inet->sk_state;

if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
return;
if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
return;
xs_abort_connection(xprt, transport);
}

That's changed since 2.6.26 where it unconditionally did the connect() with sa_family == AF_UNSPEC. FWIW we cannot reproduce this problem with 2.6.26.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/