Re: [PATCH 2/2] NFS: Improve nfsiod workqueue detection for allocation flags
From: Trond Myklebust
Date: Mon Jul 07 2025 - 15:25:29 EST
On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote:
> The NFS client writeback paths change which flags are passed to their
> memory allocation calls based on whether the current task is running
> from
> within a workqueue or not. More specifically, it appears that during
> writeback allocations with PF_WQ_WORKER set on current->flags will
> add
> __GFP_NORETRY | __GFP_NOWARN. Presumably this is because nfsiod can
> simply fail quickly and later retry to write back that specific page
> should
> the allocation fail.
>
> However, the check for PF_WQ_WORKER is too general because tasks can
> enter NFS
> writeback paths from other workqueues. Specifically, the loopback
> driver
> tends to perform writeback into backing files on NFS with
> PF_WQ_WORKER set,
> and additionally sets PF_MEMALLOC_NOIO. The combination of
> PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in allocation
> failures and the loopback driver has no retry functionality. As a
> result,
> after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting
> stuck in
> mempool_alloc()") users are seeing corrupted loop-mounted filesystems
> backed
> by image files on NFS.
>
> In a preceding patch, we introduced a function to allow NFS to detect
> if
> the task is executing within a specific workqueue. Here we use that
> helper
> to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is nfsiod.
>
> Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck in
> mempool_alloc()")
> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> ---
> fs/nfs/internal.h | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 69c2c10ee658..173172afa3f5 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -12,6 +12,7 @@
> #include <linux/nfs_page.h>
> #include <linux/nfslocalio.h>
> #include <linux/wait_bit.h>
> +#include <linux/workqueue.h>
>
> #define NFS_SB_MASK (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
>
> @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct nfs_writeverf
> *verf,
> !nfs_write_verifier_cmp(&req->wb_verf, &verf-
> >verifier);
> }
>
> +static inline bool is_nfsiod(void)
> +{
> + struct workqueue_struct *current_wq = current_workqueue();
> +
> + if (current_wq)
> + return current_wq == nfsiod_workqueue;
> + return false;
> +}
> +
> static inline gfp_t nfs_io_gfp_mask(void)
> {
> - if (current->flags & PF_WQ_WORKER)
> + if (is_nfsiod())
> return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
> return GFP_KERNEL;
> }
Instead of trying to identify the nfsiod_workqueue, why not apply
current_gfp_context() in order to weed out callers that set
PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS?
i.e.
static inline gfp_t nfs_io_gfp_mask(void)
{
gfp_t ret = current_gfp_context(GFP_KERNEL);
if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL)
ret |= __GFP_NORETRY | __GFP_NOWARN;
return ret;
}
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx