Re: [PATCH] SUNRPC: Fix a race in xs_reset_transport

From: Trond Myklebust
Date: Thu Sep 17 2015 - 10:50:30 EST


On Thu, 2015-09-17 at 10:18 -0400, Jeff Layton wrote:
> On Thu, 17 Sep 2015 09:38:33 -0400
> Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>
> > On Tue, Sep 15, 2015 at 2:52 PM, Jeff Layton <
> > jlayton@xxxxxxxxxxxxxxx> wrote:
> > > On Tue, 15 Sep 2015 16:49:23 +0100
> > > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote:
> > >
> > > > net/sunrpc/xprtsock.c | 9 ++++++++-
> > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 7be90bc..6f4789d 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -822,9 +822,16 @@ static void xs_reset_transport(struct
> > > > sock_xprt *transport)
> > > > if (atomic_read(&transport->xprt.swapper))
> > > > sk_clear_memalloc(sk);
> > > >
> > > > - kernel_sock_shutdown(sock, SHUT_RDWR);
> > > > + if (sock)
> > > > + kernel_sock_shutdown(sock, SHUT_RDWR);
> > > >
> > >
> > > Good catch, but...isn't this still racy? What prevents transport
> > > ->sock
> > > being set to NULL after you assign it to "sock" but before
> > > calling
> > > kernel_sock_shutdown?
> >
> > The XPRT_LOCKED state.
> >
>
> IDGI -- if the XPRT_LOCKED bit was supposed to prevent that, then
> how could you hit the original race? There should be no concurrent
> callers to xs_reset_transport on the same xprt, right?

Correct. The only exception is xs_destroy.

> AFAICT, that bit is not set in the xprt_destroy codepath, which may
> be
> the root cause of the problem. How would we take it there anyway?
> xprt_destroy is void return, and may not be called in the context of
> a
> rpc_task. If it's contended, what do we do? Sleep until it's
> cleared?
>

How about the following.

8<-----------------------------------------------------------------