Re: nfs udp 1000/100baseT issue

From: Bret Towe
Date: Thu Mar 16 2006 - 22:10:55 EST


On 3/16/06, Neil Brown <neilb@xxxxxxx> wrote:
> On Thursday March 16, magnade@xxxxxxxxx wrote:
> > On 3/16/06, Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > >a while ago i noticed a issue when one has a nfs server that has
> > > >gigabit connection
> > > >to a network and a client that connects to that network instead via 100baseT
> > > >that udp connection from client to server fails the client gets a
> > > >server not responding
> > > >message when trying to access a file, interesting bit is you can get a directory
> > > >listing without issue
> > > >work around i found for this is adding proto=tcp to the client side
> > > >and all works
> > > >without error
> > >
> > > UDP has its implications, like silently dropping packets when the link
> > > is full, by design. Try tcpdump on both systems and compare what packets
> > > are sent and which do arrive. The error message is then probably because
> > > the client is confused of not receiving some packets.
> >
> > after compairing a working and not working client i found that
> > packets containing offset 19240, 20720, 22200 are missing
> > and the 100baseT client had an extra offset of 32560
> > on the working client it ends at 31080
> >
> > the missing ones are mostly constantly missing 22200 appears every so often
> > on retransmission and 23680 also disappears every so often
> >
> > i hope that isnt too confusing i dont use tcpdump type stuff much
> > (well i did give up on tcpdump and had to use ethereal...)
>
> This is all to be expected. I remember having this issue with a
> server on 100M and clients in 10M...
>
> There is no flow control in UDP

is this a linux design flaw or just nature of udp?

>. If anything gets lots, the client
> has to resend the request, and the server then has to respond again.
> If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
> then there is a good chance that one or more fragments of a reply will
> get lots in the switch stepping down from 1G to 100M. Every time.
>
> Your options include:
>
> - use tcp

im wondering why this isnt the default to begin with

> - get a switch with a (much) bigger packet buffer
> - drop the server down to 100M
> - drop the nfs rsize down to 1024 to you don't get fragments.
these last 2 options sound rather painfull speed wise
tcp work around is prob by far the easiest

>
> NeilBrown
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/