Re: [RESEND4, PATCH 2/2] fuse: require /dev/fuse reads to have enough buffer capacity as negotiated

From: Miklos Szeredi
Date: Wed Apr 24 2019 - 06:48:51 EST


On Wed, Mar 27, 2019 at 11:44 AM Kirill Smelkov <kirr@xxxxxxxxxx> wrote:
>
> A FUSE filesystem server queues /dev/fuse sys_read calls to get
> filesystem requests to handle. It does not know in advance what would be
> that request as it can be anything that client issues - LOOKUP, READ,
> WRITE, ... Many requests are short and retrieve data from the
> filesystem. However WRITE and NOTIFY_REPLY write data into filesystem.
>
> Before getting into operation phase, FUSE filesystem server and kernel
> client negotiate what should be the maximum write size the client will
> ever issue. After negotiation the contract in between server/client is
> that the filesystem server then should queue /dev/fuse sys_read calls with
> enough buffer capacity to receive any client request - WRITE in
> particular, while FUSE client should not, in particular, send WRITE
> requests with > negotiated max_write payload. FUSE client in kernel and
> libfuse historically reserve 4K for request header. This way the
> contract is that filesystem server should queue sys_reads with
> 4K+max_write buffer.
>
> If the filesystem server does not follow this contract, what can happen
> is that fuse_dev_do_read will see that request size is > buffer size,
> and then it will return EIO to client who issued the request but won't
> indicate in any way that there is a problem to filesystem server.
> This can be hard to diagnose because for some requests, e.g. for
> NOTIFY_REPLY which mimics WRITE, there is no client thread that is
> waiting for request completion and that EIO goes nowhere, while on
> filesystem server side things look like the kernel is not replying back
> after successful NOTIFY_RETRIEVE request made by the server.
>
> -> We can make the problem easy to diagnose if we indicate via error
> return to filesystem server when it is violating the contract.
> This should not practically cause problems because if a filesystem
> server is using shorter buffer, writes to it were already very likely to
> cause EIO, and if the filesystem is read-only it should be too following
> 8K minimum buffer size (= either FUSE_MIN_READ_BUFFER, see 1d3d752b47,
> or = 4K + min(max_write)=4k cared to be so by process_init_reply).
>
> Please see [1] for context where the problem of stuck filesystem was hit
> for real (because kernel client was incorrectly sending more than
> max_write data with NOTIFY_REPLY; see also previous patch), how the
> situation was traced and for more involving patch that did not make it
> into the tree.
>
> [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2

Applied.

Thanks,
Miklos