Re: [Question]nfs: never returned delegation

From: zhangjian (CG)
Date: Mon Aug 11 2025 - 22:46:11 EST


Thanks a lot for reply.

Stateid is marked NFS4_INVALID_STATEID_TYPE when delegation is marked
NFS4ERR_DELEG_REVOKED. nfs_mark_test_expired_delegation will not mark
delegation as NFS_DELEGATION_TEST_EXPIRED again. In this case,
TEST_STATEID and FREE_STATEID will not be send to server any more.
This means that if return-delegation-procedure meet ETIMEOUT, delegation
will be in server clp->cl_revoked list forever.

On 2025/8/11 21:03, Jeff Layton wrote:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many test_state_id request after a non-privilaged request in tcpdump result. There are 40w+ delegations in client (I read the delegation list from /proc/kcore).
>> Firstly, I think state manager cost a lot in nfs_server_reap_expired_delegations. But I see they are all in NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I read this from /proc/kcore too).
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and never return it again. NFS server will keep the revoked delegation in clp->cl_revoked forever. This will result in following sequence response with RECALLABLE_STATE_REVOKED flag. Client will send test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not the only case that cause lots of non-terminable test_state_id requests after any non-privilaged request.
>> Wish NFS experts give some advices on this problem.
>>
>
> What should happen is that the client should issue a TEST_STATEID and
> then follow up with a FREE_STATEID once it's clear that it has been
> revoked. Alternately, if the client expires then the server will purge
> any state it held at that point. The server is required to keep a
> record of these objects until one of those events occurs.
>
> v5.10 is pretty old, and there have been a number of fixes in this area
> in both the client and server over the last several years. You may want
> to try a newer kernel (or look at doing some backporting).
>
> Cheers,