Re: [RFC PATCH] nfsd: convert the nfsd_users to atomic_t

From: NeilBrown
Date: Fri Jun 20 2025 - 07:55:46 EST


On Wed, 18 Jun 2025, chenxiaosong@xxxxxxxxxxxxxxxx wrote:
> From: ChenXiaoSong <chenxiaosong@xxxxxxxxxx>
>
> Before commit 38f080f3cd19 ("NFSD: Move callback_wq into struct nfs4_client"),
> we had a null-ptr-deref in nfsd4_probe_callback() (Link[1]):
>
> nfsd: last server has exited, flushing export cache
> NFSD: starting 90-second grace period (net f0000030)
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

The only possible cause that I can find for this crash is that the nfsd
thread must have still been running when nfsd_shutdown_net() and then
nfsd_shutdown_generic() were called resulting in the workqueue being
destroyed.

The threads will all have been signalled with SIGKILL, but there was no
mechanism to wait for the threads to complete.

This was changed in

Commit: 3409e4f1e8f2 ("NFSD: Make it possible to use svc_set_num_threads_sync")

Sync then threads were stopped synchronously so they were certainly all
stopped before the workqueue was removed.

NeilBrown


> ...
> Call trace:
> __queue_work+0xb4/0x558
> queue_work_on+0x88/0x90
> nfsd4_probe_callback+0x4c/0x58 [nfsd]
> NFSD: starting 90-second grace period (net f0000030)
> nfsd4_probe_callback_sync+0x20/0x38 [nfsd]
> nfsd4_init_conn.isra.57+0x8c/0xa8 [nfsd]
> nfsd4_create_session+0x5b8/0x718 [nfsd]
> nfsd4_proc_compound+0x4c0/0x710 [nfsd]
> nfsd_dispatch+0x104/0x248 [nfsd]
> svc_process_common+0x348/0x808 [sunrpc]
> svc_process+0xb0/0xc8 [sunrpc]
> nfsd+0xf0/0x160 [nfsd]
> kthread+0x134/0x138
> ret_from_fork+0x10/0x18
> Code: aa1c03e0 97ffffba aa0003e2 b5000780 (f9400262)
> SMP: stopping secondary CPUs
> Starting crashdump kernel...
> Bye!
>
> One of the cases is:
>
> task A (cpu 1) | task B (cpu 2) | task C (cpu 3)
> ---------------------|----------------------|---------------------------------
> nfsd_startup_generic | nfsd_startup_generic |
> nfsd_users == 0 | nfsd_users == 0 |
> nfsd_users++ | nfsd_users++ |
> nfsd_users == 1 | |
> ... | |
> callback_wq == xxx | |
> ---------------------|----------------------|---------------------------------
> | | nfsd_shutdown_generic
> | | nfsd_users == 1
> | | --nfsd_users
> | | nfsd_users == 0
> | | ...
> | | callback_wq == xxx
> | | destroy_workqueue(callback_wq)
> ---------------------|----------------------|---------------------------------
> | nfsd_users == 1 |
> | ... |
> | callback_wq == yyy |
>
> After commit 38f080f3cd19 ("NFSD: Move callback_wq into struct nfs4_client"),
> this issue no longer occurs, but we should still convert the nfsd_users
> to atomic_t to prevent other similar issues.
>
> Link[1]: https://chenxiaosong.com/en/nfs/en-null-ptr-deref-in-nfsd4_probe_callback.html
> Co-developed-by: huhai <huhai@xxxxxxxxxx>
> Signed-off-by: huhai <huhai@xxxxxxxxxx>
> Signed-off-by: ChenXiaoSong <chenxiaosong@xxxxxxxxxx>
> ---
> fs/nfsd/nfssvc.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 9b3d6cff0e1e..08b1f9ebdc2a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -270,13 +270,13 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
> return 0;
> }
>
> -static int nfsd_users = 0;
> +static atomic_t nfsd_users = ATOMIC_INIT(0);
>
> static int nfsd_startup_generic(void)
> {
> int ret;
>
> - if (nfsd_users++)
> + if (atomic_fetch_inc(&nfsd_users))
> return 0;
>
> ret = nfsd_file_cache_init();
> @@ -291,13 +291,13 @@ static int nfsd_startup_generic(void)
> out_file_cache:
> nfsd_file_cache_shutdown();
> dec_users:
> - nfsd_users--;
> + atomic_dec(&nfsd_users);
> return ret;
> }
>
> static void nfsd_shutdown_generic(void)
> {
> - if (--nfsd_users)
> + if (atomic_dec_return(&nfsd_users))
> return;
>
> nfs4_state_shutdown();
> --
> 2.34.1
>
>