Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

From: Amir Goldstein
Date: Sat May 16 2020 - 02:58:27 EST


[pulling in nfs guys]

> > Questions:
> > 1. Does sync() result in fully purging inodes on MDS?
>
> I don't think so, but again, that code is not trivial to follow. I do
> know that the MDS keeps around a "strays directory" which contains
> unlinked inodes that are lazily cleaned up. My suspicion is that it's
> satisfying lookups out of this cache as well.
>
> Which may be fine...the MDS is not required to be POSIX compliant after
> all. Only the fs drivers are.
>
> > 2. Is i_nlink synchronized among nodes on deferred delete?
> > IWO, can inode come back from the dead on client if another node
> > has linked it before i_nlink 0 was observed?
>
> No, that shouldn't happen. The caps mechanism should ensure that it
> can't be observed by other clients until after the change.
>
> That said, Luis' current patch doesn't ensure we have the correct caps
> to check the i_nlink. We may need to add that in before we can roll with
> this.
>
> > 3. Can an NFS client be "migrated" from one ceph node to another
> > with an open but unlinked file?
> >
>
> No. Open files in ceph are generally per-client. You can't pass around a
> fd (or equivalent).

Not sure we are talking about the same thing.
It's not ceph fd that is being passed around, it's the NFS client's fd.
If there is no case where NFS client would access ceph client2
with a handle it got from ceph client1, then there is no reason to satisfy
an open_by_handle() call for an unlinked file on client2.
If file was opened on client1, it may be "legal" to satisfy open_by_handle()
on client2, but I don't see how stopping to satisfy that can break anything.

>
> > I think what the test is trying to verify is that a "fully purged" inodes
> > cannot be opened db handle, but there is no standard way to verify
> > "fully purged", so the test resorts to sync() + another sync() + drop_caches.
> >
>
> Got it. That makes sense.
>
> > Is there anything else that needs to be done on ceph in order to flush
> > all deferred operations from this client to MDS?
>
> I'm leaning toward something like what Luis has proposed, but adding in
> appropriate cap handling.

That sounds fine.

>
> Basically, we have to consider the situation where one client has the
> file open and another client unlinks it, and then does an
> open_by_handle_at. Should it succeed in that case?
>
> I can see arguments for either way.

IMO, the behavior should be defined for a client that has the file open.
For the rest it does not really matter.

My argument is that is it easy to satisfy the test's expectation and conform
to behavior of other filesystems without breaking any real workload.

To satisfy the test's expectation, you only need to change behavior of ceph
client in i_count 1 use case. If i_count is 1 need to take all relevant caps
to check that i_nlink is "globally" 0, before returning ESTALE.
But if i_count > 1, no need to bother.

Thanks,
Amir.