Re: [PATCH v3] NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals

From: Chuck Lever
Date: Mon Jan 27 2020 - 10:05:45 EST




> On Jan 27, 2020, at 9:45 AM, Robert Milkowski <rmilkowski@xxxxxxxxx> wrote:
>
> On Thu, 23 Jan 2020 at 19:08, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>
>> On Wed, 2020-01-22 at 19:10 +0000, Schumaker, Anna wrote:
>>> Hi Robert,
>>>
>>> On Mon, 2020-01-20 at 17:55 +0000, Robert Milkowski wrote:
>>>>> -----Original Message-----
>>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>> Sent: 30 December 2019 15:37
>>>>> To: Robert Milkowski <rmilkowski@xxxxxxxxx>
>>>>> Cc: Linux NFS Mailing List <linux-nfs@xxxxxxxxxxxxxxx>; Trond
>>>>> Myklebust
>>>>> <trond.myklebust@xxxxxxxxxxxxxxx>; Anna Schumaker
>>>>> <anna.schumaker@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
>>>>> Subject: Re: [PATCH v3] NFSv4.0: nfs4_do_fsinfo() should not do
>>>>> implicit
>>>>> lease renewals
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 30, 2019, at 10:20 AM, Robert Milkowski <
>>>>>> rmilkowski@xxxxxxxxx>
>>>>> wrote:
>>>>>> From: Robert Milkowski <rmilkowski@xxxxxxxxx>
>>>>>>
>>>>>> Currently, each time nfs4_do_fsinfo() is called it will do an
>>>>>> implicit
>>>>>> NFS4 lease renewal, which is not compliant with the NFS4
>>>>> specification.
>>>>>> This can result in a lease being expired by an NFS server.
>>>>>>
>>>>>> Commit 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing
>>>>>> leases")
>>>>>> introduced implicit client lease renewal in nfs4_do_fsinfo(),
>>>>>> which
>>>>>> can result in the NFSv4.0 lease to expire on a server side, and
>>>>>> servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.
>>>>>>
>>>>>> This can easily be reproduced by frequently unmounting a sub-
>>>>>> mount,
>>>>>> then stat'ing it to get it mounted again, which will delay or
>>>>>> even
>>>>>> completely prevent client from sending RENEW operations if no
>>>>>> other
>>>>>> NFS operations are issued. Eventually nfs server will expire
>>>>>> client's
>>>>>> lease and return an error on file access or next RENEW.
>>>>>>
>>>>>> This can also happen when a sub-mount is automatically
>>>>>> unmounted due
>>>>>> to inactivity (after nfs_mountpoint_expiry_timeout), then it is
>>>>>> mounted again via stat(). This can result in a short window
>>>>>> during
>>>>>> which client's lease will expire on a server but not on a
>>>>>> client.
>>>>>> This specific case was observed on production systems.
>>>>>>
>>>>>> This patch makes an explicit lease renewal instead of an
>>>>>> implicit one,
>>>>>> by adding RENEW to a compound operation issued by
>>>>>> nfs4_do_fsinfo(),
>>>>>> similarly to NFSv4.1 which adds SEQUENCE operation.
>>>>>>
>>>>>> Fixes: 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing
>>>>>> leases")
>>>>>> Signed-off-by: Robert Milkowski <rmilkowski@xxxxxxxxx>
>>>>>
>>>>> Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>
>>>>>
>>>>
>>>> How do we progress it further?
>>>
>>> Thanks for following up! I have the patch included in my linux-next
>>> branch for
>>> the next merge window.
>>
>> NACK. This is the wrong way to solve the problem. Lease renewal in
>> NFSv4 does not need to be tied to fsinfo. It creates an unnecessary
>> extra error condition that has absolutely nothing to do with the
>> functionality of retrieving per-filesystem attributes.
>
> Well, we also do it for NFSv4.1+ with the SEQUENCE operation being
> send by fsinfo, and as Chuck pointed out
> it makes sense to do similarly for 4.0, which is what this patch does.

I did say that.

However, I can see that for NFSv4.1+, the client code handling the
SEQUENCE response will update cl_last_renewal. It does not need to
be done in the fsinfo code.

The NFSv4.0 behavior should be correct if cl_last_renewal is not
updated. That should force the client to send a separate RENEW
operation so that both the client and server agree that the lease
is active.

If I understand Trond correctly?


> Or as per the v2 version of the patch, not do the implicit RENEW for
> 4.0, which was a simpler patch,
> but then not in-line with 4.1+.
>
>>
>> All that needs to be done here is to move the setting of clp-
>>> cl_last_renewal _out_ of nfs4_set_lease_period(), and just have
>> nfs4_proc_setclientid_confirm() and nfs4_update_session() call
>> do_renew_lease().
>>
>
> This would also require nfs4_setup_state_renewal() to call
> do_renew_lease() I think - at least it currently calls
> nfs4_set_lease_period().
> Also, iirc fsinfo() not setting cl_last_renewal leads to
> cl_last_renewal initialization issues under some circumstances.
>
> Then the RFC 7530 in section 16.34.5 states:
> "SETCLIENTID_CONFIRM does not establish or renew a lease.", so calling
> do_renew_lease() from nfs4_setclientid_confirm() doesn't seem to be
> ok.
>
> I'm not sure if is is valid to do implicit lease renewal in
> nfs4_update_session() either...
>
>
> Anyway, the patch would be something like (haven't tested it yet):
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 76d3716..a7af864 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5019,16 +5019,13 @@ static int nfs4_do_fsinfo(struct nfs_server
> *server, struct nfs_fh *fhandle, str
> struct nfs4_exception exception = {
> .interruptible = true,
> };
> - unsigned long now = jiffies;
> int err;
>
> do {
> err = _nfs4_do_fsinfo(server, fhandle, fsinfo);
> trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err);
> if (err == 0) {
> - nfs4_set_lease_period(server->nfs_client,
> - fsinfo->lease_time * HZ,
> - now);
> + nfs4_set_lease_period(server->nfs_client,
> fsinfo->lease_time * HZ)
> break;
> }
> err = nfs4_handle_exception(server, err, &exception);
> @@ -6146,6 +6143,10 @@ int nfs4_proc_setclientid_confirm(struct nfs_client *clp,
> clp->cl_clientid);
> status = rpc_call_sync(clp->cl_rpcclient, &msg,
> RPC_TASK_TIMEOUT | RPC_TASK_NO_ROUND_ROBIN);
> + if(status == 0) {
> + unsigned long now = jiffies;
> + do_renew_lease(clp, now);
> + }
> trace_nfs4_setclientid_confirm(clp, status);
> dprintk("NFS reply setclientid_confirm: %d\n", status);
> return status;
> @@ -8590,6 +8591,8 @@ static void nfs4_update_session(struct
> nfs4_session *session,
> if (res->flags & SESSION4_BACK_CHAN)
> memcpy(&session->bc_attrs, &res->bc_attrs,
> sizeof(session->bc_attrs));
> + unsigned long now = jiffies;
> + do_renew_lease(session->clp, now);
> }
>
> static int _nfs4_proc_create_session(struct nfs_client *clp,
> diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
> index 6ea431b..ff876dd 100644
> --- a/fs/nfs/nfs4renewd.c
> +++ b/fs/nfs/nfs4renewd.c
> @@ -138,15 +138,12 @@
> *
> * @clp: pointer to nfs_client
> * @lease: new value for lease period
> - * @lastrenewed: time at which lease was last renewed
> */
> void nfs4_set_lease_period(struct nfs_client *clp,
> - unsigned long lease,
> - unsigned long lastrenewed)
> + unsigned long lease)
> {
> spin_lock(&clp->cl_lock);
> clp->cl_lease_time = lease;
> - clp->cl_last_renewal = lastrenewed;
> spin_unlock(&clp->cl_lock);
>
> /* Cap maximum reconnect timeout at 1/2 lease period */
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 3455232..d7b02fd 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -102,7 +102,8 @@ static int nfs4_setup_state_renewal(struct nfs_client *clp)
> now = jiffies;
> status = nfs4_proc_get_lease_time(clp, &fsinfo);
> if (status == 0) {
> - nfs4_set_lease_period(clp, fsinfo.lease_time * HZ, now);
> + nfs4_set_lease_period(clp, fsinfo.lease_time * HZ);
> + nfs4_do_renew_lease(clp, now);
> nfs4_schedule_state_renewal(clp);
> }
>
>
>
> But it potentially has issues, as just pointed out, mainly with not
> being compliant with rfc again.
>
> Also see the comments above about the SEQUENCE.
> Trond - ?
>
> Chuck, Anna - which way do you prefer, the one proposed by Chuck or
> the one now proposed by Trond?
>
> Or perhaps my v2 patch which is least invasive and just stops doing
> implicit renewals for 4.0 if cl_last_renewal is already set.
>
> I personally think the way proposed by Chuck is fully compliant with
> RFC and in-line with what we currently do for 4.1 via SEQUENCE,
> so perhaps the best option. ?
>
> --
> Robert Milkowski

--
Chuck Lever