Re: NFS stop/start problems (related to datagram shutdown bug?)

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Mon Feb 05 2001 - 18:57:09 EST


On Monday February 5, gandalf@winds.org wrote:
> Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
> ran into this problem:
>
> Stopping NFS says the following in the kernel logs:
>
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> svc: server socket destroy delayed
>
> And restarting NFS has the following error message:
>
> root:~> /etc/rc.d/init.d/nfs start
> Starting NFS services: [ OK ]
> Starting NFS quotas: [ OK ]
> Starting NFS mountd: [ OK ]
> Starting NFS daemon: nfssvc: Address already in use
> [FAILED]

How repeatable is this? Is the server SMP?

There does seem to be a possible problem with sk_inuse not being
updated atomically, so a race between an increment and a decrement
could lose one of them.
svc_sock_release seems to often be called with no more protection than
the BKL, and it decrements sk_inuse.

svc_sock_enqueue, on the other hand increments sk_inuse, and is
protected by sv_lock, but not, I think, by the BKL, as it is called by
a networking layer callback. So there might be a possibility for a
race here.

The attached patch might fix it, so if you are having reproducable
problems, it might be worth applying this patch.

Trond: any comments?

NeilBrown

[ a better fix would be to make sk_inuse atomic_t ]

--- net/sunrpc/svcsock.c 2001/02/05 23:45:54 1.1
+++ net/sunrpc/svcsock.c 2001/02/05 23:48:12
@@ -211,16 +211,22 @@
 svc_sock_release(struct svc_rqst *rqstp)
 {
         struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_serv *serv = svsk->sk_server;
 
         if (!svsk)
                 return;
         svc_release_skb(rqstp);
         rqstp->rq_sock = NULL;
+
+ spin_lock_bh(&serv->sv_lock);
         if (!--(svsk->sk_inuse) && svsk->sk_dead) {
+ spin_unlock_bh(&serv->sv_lock);
                 dprintk("svc: releasing dead socket\n");
                 sock_release(svsk->sk_sock);
                 kfree(svsk);
         }
+ else
+ spin_unlock_bh(&serv->sv_lock);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 21:00:23 EST