Re: [PATCHv2] SUNRPC: Fix a race in xs_reset_transport

From: Trond Myklebust
Date: Fri Sep 18 2015 - 12:52:05 EST


On Fri, 2015-09-18 at 12:19 +0100, Suzuki K. Poulose wrote:
> On 16/09/15 12:17, Jeff Layton wrote:
> > On Wed, 16 Sep 2015 10:35:49 +0100
> > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote:
> >
> > > From: "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx>
> > >
>
> ...
>
> > > + write_unlock_bh(&sk->sk_callback_lock);
> > > + return;
> > > + }
> > > + sock = transport->sock;
> > > +
> > > transport->inet = NULL;
> > > transport->sock = NULL;
> > >
> > > @@ -833,6 +838,10 @@ static void xs_reset_transport(struct
> > > sock_xprt *transport)
> > > xs_restore_old_callbacks(transport, sk);
> > > xprt_clear_connected(xprt);
> > > write_unlock_bh(&sk->sk_callback_lock);
> > > +
> > > + if (sock)
> > > + kernel_sock_shutdown(sock, SHUT_RDWR);
> > > +
> > > xs_sock_reset_connection_flags(xprt);
> > >
> > > trace_rpc_socket_close(xprt, sock);
> >
> > Better, but now I'm wondering...is it problematic to restore the
> > old
> > callbacks before calling kernel_sock_shutdown? I can't quite tell
> > whether it matters in all cases.
> >
> > It might be best to just go ahead and take the spinlock twice here.
> > Do
> > it once to clear the transport->sock pointer, call
> > kernel_sock_shutdown, and then take it again to restore the old
> > callbacks, etc.
> >
> > I don't know though...I get the feeling there are races all over
> > the
> > place in this code. It seems like there's a similar one wrt to the
> > transport->inet pointer. It seems a little silly that we clear it
> > under
> > the sk->sk_callback_lock. You have to dereference that pointer
> > in order to get to the lock.
> >
> > Maybe the right solution is to use an xchg to swap the inet pointer
> > with NULL so it can act as a gatekeeper. Whoever gets there first
> > does
> > the rest of the shutdown.
> >
> > Something like this maybe? Would this also fix the original
> > problem?
> > Note that this patch is untested...
> >
> > [PATCH] sunrpc: use xchg to fetch and clear the transport->inet
> > pointer in xs_reset_transport
> >
> > Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@xxxxxxx>
> > Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
> > ---
> > net/sunrpc/xprtsock.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 7be90bc1a7c2..57f79dcab493 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -813,9 +813,10 @@ static void xs_error_report(struct sock *sk)
> > static void xs_reset_transport(struct sock_xprt *transport)
> > {
> > struct socket *sock = transport->sock;
> > - struct sock *sk = transport->inet;
> > + struct sock *sk;
> > struct rpc_xprt *xprt = &transport->xprt;
> >
> > + sk = xchg(&transport->inet, NULL);
> > if (sk == NULL)
> > return;
> >
> > @@ -825,7 +826,6 @@ static void xs_reset_transport(struct sock_xprt
> > *transport)
> > kernel_sock_shutdown(sock, SHUT_RDWR);
> >
> > write_lock_bh(&sk->sk_callback_lock);
> > - transport->inet = NULL;
> > transport->sock = NULL;
> >
> > sk->sk_user_data = NULL;
> >
>
>
> This one seemed to fix it, so if it matters :
>
> Tested-by: Suzuki K. Poulose <suzuki.poulose@xxxxxxx>


I don't think it does. It addresses a symptom, but the actual problem
is that we're running 2 parallel close() calls on the same socket
during a shutdown. That must not happen because it means we have
something else trying to use the socket while it is being freed.

I think what is happening is that we're triggering the socket autoclose
mechanism from the state change callback. You're seeing the problem
more frequently because we added the call to kernel_sock_shutdown() as
part of the socket shutdown, but AFAICS, it could still be triggered
from some external event such as a server-initiated shutdown that
happened at the same time.
In fact, looking at the code, it could even be triggered from the data
receive side of things.
Both of these things are bad, because autoclose puts the transport
struct that is being freed onto a workqueue. That again can lead to a
really bad use-after-free situation if the timing is just a little
different.

So how about the following patch? It should apply cleanly on top of the
first one (which is still needed, btw).

8<--------------------------------------------------------------------