Re: [V9fs-developer] [PATCH] net: trans_rdma: remove unusedfunction

From: Dominique Martinet
Date: Thu Jul 25 2013 - 15:05:29 EST


Eric Van Hensbergen wrote on Thu, Jul 25, 2013 :
> So, the cancel function should be used to flush any pending requests that
> haven't actually been sent yet. Looking at the 9p RDMA code, it looks like
> the thought was that this wasn't going to be possible. Regardless of
> removing unsent requests, the flush will still be sent and if the server
> processes it before the original request and sends a flush response back
> then we need to clear the posted buffer. This is what rdma_cancelled is
> supposed to be doing. So, the fix is to hook it into the structure -- but
> looking at the code it seems like we probably need to do something more to
> reclaim the buffer rather than just incrementing a counter.
>
> To be clear this has less to do with recovery and more to do with the
> proper implementation of 9p flush semantics. By and large, those semantics
> won't impact static file system users -- but if anyone is using the
> transport to access synthetic filesystems or files then they'll definitely
> want to have a properly implemented flush setup. The way to test this is
> to get a blocking read on a remote named pipe or fifo and then ^C it.

Ok, I knew about the concept of flush but didn't think a ^C would cause
a -ESYSRESTART, so didn't think of that.
That said, reading from, say, a fifo is an entierly local operation: the
client does a walk, getattr, doesn't do anything 9p-wise, and clunks
when it's done with it.



As for the function needing a bit more work, there's a race, but on
"normal" requests I think it is about right - the answer lays in a
comment in rdma_request:

/* When an error occurs between posting the recv and the send,
* there will be a receive context posted without a pending request.
* Since there is no way to "un-post" it, we remember it and skip
* post_recv() for the next request.
* So here,
* see if we are this `next request' and need to absorb an excess rc.
* If yes, then drop and free our own, and do not recv_post().
**/

Basically, receive buffers are sent in a queue, and we can't "retrieve"
it back, so we just don't sent next one.

There is one problem though - if the server handles the original request
before getting the flush, the receive buffer will be consumed and we
won't send a new one, so we'll starve the reception queue.
I'm afraid I don't have any bright idea there...


While we are on reception buffer issues, there is another problem with
the queue of receive buffers, even without flush, in the following
scenario:
- post a buffer for tag 0, on a hanging request
- post a buffer for tag 1
- reply for tag 1 will come on buffer from tag 0
- post another request with tag 1.. the buffer already is in the queue,
and we don't know we can post the buffer associated with tag 0 back.

I haven't found how to reproduce this perfectly yet, but a dd with
blocksize 1MB and one with blocksize 10B in parallel brought the
mountpoint down (and the whole server was completely unavailable for the
duration of the dd - TCP sessions timed out, I even got IO errors on the
local disk :D)


Regards,
--
Dominique Martinet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/