Re: 3.0+ NFS issues (bisected)

From: J. Bruce Fields
Date: Sat Aug 18 2012 - 07:14:46 EST


On Sat, Aug 18, 2012 at 10:49:31AM +0400, Michael Tokarev wrote:
> On 18.08.2012 02:32, J. Bruce Fields wrote:
> > On Fri, Aug 17, 2012 at 04:08:07PM -0400, J. Bruce Fields wrote:
> >> Wait a minute, that assumption's a problem because that calculation
> >> depends in part on xpt_reserved, which is changed here....
> >>
> >> In particular, svc_xprt_release() calls svc_reserve(rqstp, 0), which
> >> subtracts rqstp->rq_reserved and then calls svc_xprt_enqueue, now with a
> >> lower xpt_reserved value. That could well explain this.
> >
> > So, maybe something like this?
>
> Well. What can I say? With the change below applied (to 3.2 kernel
> at least), I don't see any stalls or high CPU usage on the server
> anymore. It survived several multi-gigabyte transfers, for several
> hours, without any problem. So it is a good step forward ;)
>
> But the whole thing seems to be quite a bit fragile. I tried to follow
> the logic in there, and the thing is quite a bit, well, "twisted", and
> somewhat difficult to follow. So I don't know if this is the right
> fix or not. At least it works! :)

Suggestions welcomed.

> And I really wonder why no one else reported this problem before.
> Is me the only one in this world who uses linux nfsd? :)

This, for example:

http://marc.info/?l=linux-nfs&m=134131915612287&w=2

may well describe the same problem.... It just needed some debugging
persistence, thanks!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/