Re: NFS regression? Odd delays and lockups accessing an NFS export.

From: Ian Campbell
Date: Sun Aug 24 2008 - 14:53:49 EST


On Fri, 2008-08-22 at 14:56 -0700, Trond Myklebust wrote:
> On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
> > I can ssh to the server fine. The same server also serves my NFS home
> > directory to the box I'm writing this from and I've not seen any trouble
> > with this box at all, it's a 2.6.18-xen box.
>
> OK... Are you able to reproduce the problem reliably?
>
> If so, can you provide me with a binary tcpdump or wireshark dump? If
> using tcpdump, then please use something like
>
> tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049
>
> Please also try to provide a netstat dump of the current TCP connections
> as soon as the hang occurs:
>
> netstat -t

Aug 24 18:08:59 iranon kernel: [168839.556017] nfs: server hopkins not responding, still trying
but I wasn't around until 19:38 to spot it.

netstat when I got to it was:

Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost.localdo:50891 localhost.localdom:6543 ESTABLISHED
tcp 1 0 iranon.hellion.org.:ssh azathoth.hellion.:52682 CLOSE_WAIT
tcp 0 0 localhost.localdom:6543 localhost.localdo:50893 ESTABLISHED
tcp 0 0 iranon.hellion.org.:837 hopkins.hellion.org:nfs FIN_WAIT2
tcp 0 0 localhost.localdom:6543 localhost.localdo:41831 ESTABLISHED
tcp 0 0 localhost.localdo:13666 localhost.localdo:59482 ESTABLISHED
tcp 0 0 localhost.localdo:34288 localhost.localdom:6545 ESTABLISHED
tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:48977 ESTABLISHED
tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:52683 ESTABLISHED
tcp 0 0 localhost.localdom:6545 localhost.localdo:34288 ESTABLISHED
tcp 0 0 localhost.localdom:6543 localhost.localdo:50891 ESTABLISHED
tcp 0 0 localhost.localdo:50893 localhost.localdom:6543 ESTABLISHED
tcp 0 0 localhost.localdo:41831 localhost.localdom:6543 ESTABLISHED
tcp 0 87 localhost.localdo:59482 localhost.localdo:13666 ESTABLISHED
tcp 1 0 localhost.localdom:6543 localhost.localdo:41830 CLOSE_WAIT

(iranon is the problematic host .4, azathoth is my desktop machine .5, hopkins is the NFS server .6)

tcpdumps are pretty big. I've attached the last 100 packets captured. If
you need more I can put the full file up somewhere.

-rw-r--r-- 1 root root 1.3G Aug 24 17:57 dump.out0
-rw-r--r-- 1 root root 536M Aug 24 19:38 dump.out1

Ian.

--
Ian Campbell

Prizes are for children.
-- Charles Ives, upon being given, but refusing, the
Pulitzer prize

Attachment: last100.dump.bz2
Description: application/bzip

Attachment: signature.asc
Description: This is a digitally signed message part